4 Haziran 2016 Cumartesi

PassFace: Face Recognition Using Opencv



Hi,


          PassFace is a face recognizing program develop using EmguCV (OpenCV wrapper). It is possible to use 4 different algorithm and 3 different source(Camera, Video, Image) in this program. Most of the tryings have been made using LFW data set which have 13000 pictures of 1500 different people. Details of project can be reached below:

Source Code:
https://github.com/mozanunal/PassFace





1. Introduction


Definition of the Problem


In this project, a program is going to develop to recognize the faces and compare them the faces it learned and give the identity of the person. All this identifying system is going to work in real time. Main subjects in this project are image processing and machine learning. Project aim to develop some algorithm to detect faces specifically recognize the faces using pattern recognition algorithms. The number of faces and the equipment needed is going to be determined according to test of different algorithms.


Motivation
To detect identity of person from images is very beneficial subject. It can provide easy access to users. It can be used for detect criminals and prevent potential crimes. Also I see this subject in a lot of industrial projects and academic researches. I want to work with real world problem and problem which is not completely solved. Therefore i decided to work with this project.


Difficulties of Problem
Main difficulties can be sorted like below:
  • Faces are not completely rigid objects so it is hard to recognize them.
  • The more person in database needs more and more processing power.
  • Blurry images because of real time system
  • The effects of ambient light
  • Changes in person’s face over time


Data Sets
AT&T Facedatabase The AT&T Face Database, sometimes also referred to as ORL Database of Faces, contains ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).


Yale Facedatabase A, also known as Yalefaces. The AT&T Face Database is good for initial tests, but it’s a fairly easy database. The Eigenfaces method already has a 97% recognition rate on it, so you won’t see any great improvements with other algorithms. The Yale Face Database A (also known as Yalefaces) is a more appropriate dataset for initial experiments, because the recognition problem is harder. The database consists of 15 people (14 male, 1 female) each with 11 grayscale images sized 320 \times 243 pixel. There are changes in the light conditions (center light, left light, right light), facial expressions (happy, normal, sad, sleepy, surprised, wink) and glasses (glasses, no-glasses).


Extended Yale Facedatabase B The Extended Yale Face Database B contains 2414 images of 38 different people in its cropped version. The focus of this database is set on extracting features that are robust to illumination, the images have almost no variation in emotion/occlusion/... . I personally think, that this dataset is too large for the experiments I perform in this document. You better use the AT&T Facedatabase for initial testing. A first version of the Yale Face Database B was used in [BHK97] to see how the Eigenfaces and Fisherfaces method perform under heavy illumination changes. [Lee05] used the same setup to take 16128 images of 28 people. The Extended Yale Face Database B is the merge of the two databases, which is now known as Extended Yale Face Database B.
Labeled Faces in the Wild, a database of face photographs designed for studying the problem of unconstrained face recognition. The data set contains more than 13,000 images of faces collected from the web. Each face has been labeled with the name of the person pictured. 1680 of the people pictured have two or more distinct photos in the data set. The only constraint on these faces is that they were detected by the Viola-Jones face detector..


Programming Environment and Libraries
Visual Studio IDE is used for project. The program is written in C# using OpenCV libraries.


2. Face Recognition Algorithm


Face Detection
Face Detection is not the main subject of this project but to create database and to increase the face recognition performance. Opencv’s Haar Cascade Classifier function is used. In this function a haar cascade file ,which is pre learned for face detection, is used.


Morphologic Operations
Ambient light and the movement at faces are the challenging problems in face recognition. Therefore some morphologic operators is applied to the faces to decrease the effect of these problems. In this project equalize histogram function of opencv is used to decrease the effect of the ambient light.


SURF Feature Extractor






                 if (comboBoxAlgorithm.Text == "SURF Feature Extractor")
                 {
                     string dataDirectory=Directory.GetCurrentDirectory()+"\\TrainedFaces";
                     string[] files = Directory.GetFiles(dataDirectory, "*.jpeg", SearchOption.AllDirectories);


                     foreach (var file in files)
                     {
                         richTextBox1.Text += file.ToString();
                         long recpoints;
                         Image<Bgr,Byte>sampleImage = new Image<Bgr, Byte>(file);
                         secondImageBox.Image = sampleImage;
                         using (Image<Gray, Byte> modelImage = sampleImage.Convert<Gray, Byte>())
                         using (Image<Gray, Byte> observedImage = image.Convert<Gray, Byte>())
                         {
                             Image<Bgr, byte> result = SurfRecognizer.Draw(modelImage, observedImage, out recpoints);
                             //captureImageBox.Image = observedImage;
                             if (recpoints > 10)
                             {
                                 MCvFont f = new MCvFont(Emgu.CV.CvEnum.FONT.CV_FONT_HERSHEY_COMPLEX, 1.0, 1.0);
                                 result.Draw("Person Recognited, Welcome", ref f, new Point(40, 40), new Bgr(0, 255, 0));
                                 ImageViewer.Show(result, String.Format(" {0} Points Recognited", recpoints));
                             }
                         }
                     }


In SIFT, Lowe approximated Laplacian of Gaussian with Difference of Gaussian for finding scale-space. SURF goes a little further and approximates LoG with Box Filter. Below image shows a demonstration of such an approximation. One big advantage of this approximation is that, convolution with box filter can be easily calculated with the help of integral images. And it can be done in parallel for different scales. Also the SURF rely on determinant of Hessian matrix for both scale and location.
Box Filter approximation of Laplacian
For orientation assignment, SURF uses wavelet responses in horizontal and vertical direction for a neighbourhood of size 6s. Adequate gaussian weights are also applied to it. Then they are plotted in a space as given in below image. The dominant orientation is estimated by calculating the sum of all responses within a sliding orientation window of angle 60 degrees. Interesting thing is that, wavelet response can be found out using integral images very easily at any scale. For many applications, rotation invariance is not required, so no need of finding this orientation, which speeds up the process. SURF provides such a functionality called Upright-SURF or U-SURF. It improves speed and is robust up to \pm 15^{\circ}. OpenCV supports both, depending upon the flag, upright. If it is 0, orientation is calculated. If it is 1, orientation is not calculated and it is more faster.
Orientation Assignment in SURF
For feature description, SURF uses Wavelet responses in horizontal and vertical direction (again, use of integral images makes things easier). A neighbourhood of size 20sX20s is taken around the keypoint where s is the size. It is divided into 4x4 sub regions. For each subregion, horizontal and vertical wavelet responses are taken and a vector is formed like this, v=( \sum{d_x}, \sum{d_y}, \sum{|d_x|}, \sum{|d_y|}). This when represented as a vector gives SURF feature descriptor with total 64 dimensions. Lower the dimension, higher the speed of computation and matching, but provide better distinctiveness of features.
For more distinctiveness, SURF feature descriptor has an extended 128 dimension version. The sums of d_x and |d_x| are computed separately for d_y < 0 and d_y \geq 0. Similarly, the sums of d_y and |d_y| are split up according to the sign of d_x , thereby doubling the number of features. It doesn’t add much computation complexity. OpenCV supports both by setting the value of flag extended with 0 and 1 for 64-dim and 128-dim respectively (default is 128-dim)
Another important improvement is the use of sign of Laplacian (trace of Hessian Matrix) for underlying interest point. It adds no computational cost since it is already computed during detection. The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the reverse situation. In the matching stage, we only compare features if they have the same type of contrast (as shown in image below). This minimal information allows for faster matching, without reducing the descriptor performance.
Fast Indexing for Matching
In short, SURF adds a lot of features to improve the speed in every step. Analysis shows it is 3 times faster than SIFT while performance is comparable to SIFT. SURF is good at handling images with blurring and rotation, but not good at handling viewpoint change and illumination change.


Eigenfaces


                 else if (comboBoxAlgorithm.Text == "EigenFaces")
                 {


                     //image._EqualizeHist();
                     if (eqHisChecked.Checked == true)
                     {
                         image._EqualizeHist();
                     }
                     var result = eigenFaceRecognizer.Predict(image.Convert<Gray, Byte>().Resize(100, 100, Emgu.CV.CvEnum.INTER.CV_INTER_CUBIC));
                     if (result.Label != -1)
                     {
                         image.Draw(eigenlabels[result.Label].ToString(), ref font, new Point(face.X - 2, face.Y - 2), new Bgr(Color.LightGreen));
                         label6.Text = result.Distance.ToString();
                     }
                 }


Let X = \{ x_{1}, x_{2}, \ldots, x_{n} \} be a random vector with observations x_i \in R^{d}.
1.Compute the mean \mu
\mu = \frac{1}{n} \sum_{i=1}^{n} x_{i}
2.Compute the the Covariance Matrix S
S = \frac{1}{n} \sum_{i=1}^{n} (x_{i} - \mu) (x_{i} - \mu)^{T}`
3.Compute the eigenvalues \lambda_{i} and eigenvectors v_{i} of S
S v_{i} = \lambda_{i} v_{i}, i=1,2,\ldots,n
4.Order the eigenvectors descending by their eigenvalue. The k principal components are the eigenvectors corresponding to the k largest eigenvalues.
The k principal components of the observed vector x are then given by:
y = W^{T} (x - \mu)
where W = (v_{1}, v_{2}, \ldots, v_{k}).
The reconstruction from the PCA basis is given by:
x = W y + \mu
where W = (v_{1}, v_{2}, \ldots, v_{k}).
The Eigenfaces method then performs face recognition by:
  • Projecting all training samples into the PCA subspace.
  • Projecting the query image into the PCA subspace.
  • Finding the nearest neighbor between the projected training images and the projected query image.
Still there’s one problem left to solve. Imagine we are given 400 images sized 100 \times 100 pixel. The Principal Component Analysis solves the covariance matrix S = X X^{T}, where {size}(X) = 10000 \times 400 in our example. You would end up with a 10000 \times 10000 matrix, roughly 0.8GB. Solving this problem isn’t feasible, so we’ll need to apply a trick. From your linear algebra lessons you know that a M \times N matrix with M > N can only have N - 1non-zero eigenvalues. So it’s possible to take the eigenvalue decomposition S = X^{T} X of size N \times N instead:
X^{T} X v_{i} = \lambda_{i} v{i}
and get the original eigenvectors of S = X X^{T} with a left multiplication of the data matrix:
The resulting eigenvectors are orthogonal, to get orthonormal eigenvectors they need to be normalized to unit length.


Fisherfaces


                 else if (comboBoxAlgorithm.Text == "FisherFaces")
                 {
                     if (eqHisChecked.Checked == true)
                     {
                         image._EqualizeHist();
                     }
                     var result = fisherFaceRecognizer.Predict(image.Convert<Gray, Byte>().Resize(100, 100, Emgu.CV.CvEnum.INTER.CV_INTER_CUBIC));
                     if (result.Label != -1)
                     {
                         image.Draw(fisherlabels[result.Label].ToString(), ref font, new Point(face.X - 2, face.Y - 2), new Bgr(Color.LightGreen));
                         label6.Text = result.Distance.ToString();
                     }


                 }


Let X be a random vector with samples drawn from c classes:
\begin{align*}
    X & = & \{X_1,X_2,\ldots,X_c\} \\
    X_i & = & \{x_1, x_2, \ldots, x_n\}
\end{align*}
The scatter matrices S_{B} and S_{W} are calculated as:
\begin{align*}
    S_{B} & = & \sum_{i=1}^{c} N_{i} (\mu_i - \mu)(\mu_i - \mu)^{T} \\
    S_{W} & = & \sum_{i=1}^{c} \sum_{x_{j} \in X_{i}} (x_j - \mu_i)(x_j - \mu_i)^{T}
\end{align*}
, where \mu is the total mean:
\mu = \frac{1}{N} \sum_{i=1}^{N} x_i
And \mu_i is the mean of class i \in \{1,\ldots,c\}:
\mu_i = \frac{1}{|X_i|} \sum_{x_j \in X_i} x_j
Fisher’s classic algorithm now looks for a projection W, that maximizes the class separability criterion:
W_{opt} = \operatorname{arg\,max}_{W} \frac{|W^T S_B W|}{|W^T S_W W|}
Following, a solution for this optimization problem is given by solving the General Eigenvalue Problem:
\begin{align*}
    S_{B} v_{i} & = & \lambda_{i} S_w v_{i} \nonumber \\
    S_{W}^{-1} S_{B} v_{i} & = & \lambda_{i} v_{i}
\end{align*}
There’s one problem left to solve: The rank of S_{W} is at most (N-c), with N samples and c classes. In pattern recognition problems the number of samples N is almost always smaller than the dimension of the input data (the number of pixels), so the scatter matrix S_{W} becomes singular. In this was solved by performing a Principal Component Analysis on the data and projecting the samples into the (N-c)-dimensional space. A Linear Discriminant Analysis was then performed on the reduced data, because S_{W} isn’t singular anymore.
The optimization problem can then be rewritten as:
\begin{align*}
    W_{pca} & = & \operatorname{arg\,max}_{W} |W^T S_T W| \\
    W_{fld} & = & \operatorname{arg\,max}_{W} \frac{|W^T W_{pca}^T S_{B} W_{pca} W|}{|W^T W_{pca}^T S_{W} W_{pca} W|}
\end{align*}
The transformation matrix W, that projects a sample into the (c-1)-dimensional space is then given by:
W = W_{fld}^{T} W_{pca}^{T}


Local Binary Patterns Histograms


                 else if (comboBoxAlgorithm.Text == "LBPHFaces")
                 {
                     if (eqHisChecked.Checked == true)
                     {
                         image._EqualizeHist();
                     }
                     var result = lbphFaceRecognizer.Predict(image.Convert<Gray, Byte>().Resize(100, 100, Emgu.CV.CvEnum.INTER.CV_INTER_CUBIC));
                     if (result.Label != -1)
                     {
                         image.Draw(lbphlabels[result.Label].ToString(), ref font, new Point(face.X - 2, face.Y - 2), new Bgr(Color.LightGreen));
                         label6.Text = result.Distance.ToString();
                     }


                 }


A more formal description of the LBP operator can be given as:
LBP(x_c, y_c) = \sum_{p=0}^{P-1} 2^p s(i_p - i_c)
, with (x_c, y_c) as central pixel with intensity i_c; and i_n being the intensity of the the neighbor pixel. s is the sign function defined as:
\begin{equation}
s(x) =
\begin{cases}
1 & \text{if $x \geq 0$}\\
0 & \text{else}
\end{cases}
\end{equation}
This description enables you to capture very fine grained details in images. In fact the authors were able to compete with state of the art results for texture classification. Soon after the operator was published it was noted, that a fixed neighborhood fails to encode details differing in scale. So the operator was extended to use a variable neighborhood in [AHP04]. The idea is to align an arbitrary number of neighbors on a circle with a variable radius, which enables to capture the following neighborhoods:
../../../../_images/patterns.png
For a given Point (x_c,y_c) the position of the neighbor (x_p,y_p), p \in P can be calculated by:
\begin{align*}
x_{p} & = & x_c + R \cos({\frac{2\pi p}{P}})\\
y_{p} & = & y_c - R \sin({\frac{2\pi p}{P}})
\end{align*}
Where R is the radius of the circle and P is the number of sample points.
The operator is an extension to the original LBP codes, so it’s sometimes calledExtended LBP (also referred to as Circular LBP) . If a points coordinate on the circle doesn’t correspond to image coordinates, the point get’s interpolated. Computer science has a bunch of clever interpolation schemes, the OpenCV implementation does a bilinear interpolation:
\begin{align*}
f(x,y) \approx \begin{bmatrix}
    1-x & x \end{bmatrix} \begin{bmatrix}
    f(0,0) & f(0,1) \\
    f(1,0) & f(1,1) \end{bmatrix} \begin{bmatrix}
    1-y \\
    y \end{bmatrix}.
\end{align*}
By definition the LBP operator is robust against monotonic gray scale transformations. We can easily verify this by looking at the LBP image of an artificially modified image (so you see what an LBP image looks like!):
../../../../_images/lbp_yale.jpg
So what’s left to do is how to incorporate the spatial information in the face recognition model. The representation proposed by Ahonen et. al [AHP04] is to divide the LBP image into m local regions and extract a histogram from each. The spatially enhanced feature vector is then obtained by concatenating the local histograms (not merging them). These histograms are called Local Binary Patterns Histograms.


3. PassFace Interface


Database Creator


A database creator is developed to make easier to implement and try different algorithm.


Add DataBase Function


     private void addDatabaseButton_Click(object sender, EventArgs e)
     {
         //Take time for save filename
         string fileName = textBox1.Text+"_"+DateTime.Now.Day.ToString() + "-" + DateTime.Now.Month.ToString() + "-" + DateTime.Now.Year.ToString()
         + "-" + DateTime.Now.Hour.ToString() + "-" + DateTime.Now.Minute.ToString()+"-" + DateTime.Now.Second.ToString()+".jpeg";


         //First The faces in the Image is detected
         Image<Bgr, Byte> image = _capture.RetrieveBgrFrame().Resize(400, 300, Emgu.CV.CvEnum.INTER.CV_INTER_CUBIC);
         List<Rectangle> faces = new List<Rectangle>();
         List<Rectangle> eyes = new List<Rectangle>();
         long detectionTime;
         DetectFace.Detect(image, "haarcascade_frontalface_default.xml", "haarcascade_eye.xml", faces, eyes, out detectionTime);
         foreach (Rectangle face in faces)
         {


             image.ROI = face;
             
         }
         Directory.CreateDirectory("TrainedFaces");
         image.Resize(100, 100, Emgu.CV.CvEnum.INTER.CV_INTER_CUBIC).ToBitmap().Save("TrainedFaces\\" + fileName);


     }


     private void comboBoxAlgorithm_SelectedIndexChanged(object sender, EventArgs e)
     {
         if (comboBoxAlgorithm.Text == "EigenFaces")
         {
             try
             {
                 string dataDirectory = Directory.GetCurrentDirectory() + "\\TrainedFaces";
                     
                     string[] files = Directory.GetFiles(dataDirectory, "*.jpeg", SearchOption.AllDirectories);
                     eigenTrainedImageCounter = 0;
                     foreach (var file in files)
                     {
                           Image<Bgr,Byte> TrainedImage=new Image<Bgr, Byte>(file);
                           if (eqHisChecked.Checked == true)
                           {
                               TrainedImage._EqualizeHist();
                           }
                           eigenTrainingImages.Add(TrainedImage.Convert<Gray, Byte>());
                           eigenlabels.Add(fileName(file));
                           eigenIntlabels.Add(eigenTrainedImageCounter);
                           eigenTrainedImageCounter++;
                           richTextBox1.Text += fileName(file)+"\n";
                     }
                 /*
                     //TermCriteria for face recognition with numbers of trained images like maxIteration
                     MCvTermCriteria termCrit = new MCvTermCriteria(eigenTrainedImageCounter, 0.001);
                       
                     //Eigen face recognizer
                     eigenObjRecognizer=new EigenObjectRecognizer(
                       eigenTrainingImages.ToArray(),
                       eigenlabels.ToArray(),
                       3000,
                       ref termCrit);
                  */
                      eigenFaceRecognizer= new EigenFaceRecognizer(eigenTrainedImageCounter, 3000);
                      eigenFaceRecognizer.Train(eigenTrainingImages.ToArray(), eigenIntlabels.ToArray());
                      
             }
             catch (Exception ex)
             {
                 MessageBox.Show(ex.ToString());
                 MessageBox.Show("Nothing in binary database, please add at least a face(Simply train the prototype with the Add Face Button).", "Triained faces load", MessageBoxButtons.OK, MessageBoxIcon.Exclamation);
             }
         }


         else if (comboBoxAlgorithm.Text == "FisherFaces")
         {
             try
             {
                 string dataDirectory = Directory.GetCurrentDirectory() + "\\TrainedFaces";


                 string[] files = Directory.GetFiles(dataDirectory, "*.jpeg", SearchOption.AllDirectories);
                 fisherTrainedImageCounter = 0;
                 foreach (var file in files)
                 {
                     Image<Bgr, Byte> TrainedImage = new Image<Bgr, Byte>(file);
                     fisherTrainingImages.Add(TrainedImage.Convert<Gray, Byte>());
                     if (eqHisChecked.Checked == true)
                     {
                         TrainedImage._EqualizeHist();
                     }
                     fisherlabels.Add(fileName(file));
                     fisherIntlabels.Add(fisherTrainedImageCounter);
                     fisherTrainedImageCounter++;
                     richTextBox1.Text += fileName(file) + "\n";
                 }
                 fisherFaceRecognizer = new FisherFaceRecognizer(fisherTrainedImageCounter, 3000);
                 fisherFaceRecognizer.Train(fisherTrainingImages.ToArray(), fisherIntlabels.ToArray());


             }
             catch (Exception ex)
             {
                 MessageBox.Show(ex.ToString());
                 MessageBox.Show("Nothing in binary database, please add at least a face(Simply train the prototype with the Add Face Button).", "Triained faces load", MessageBoxButtons.OK, MessageBoxIcon.Exclamation);
             }




         }


         else if (comboBoxAlgorithm.Text == "LBPHFaces")
         {
             try
             {
                 string dataDirectory = Directory.GetCurrentDirectory() + "\\TrainedFaces";


                 string[] files = Directory.GetFiles(dataDirectory, "*.jpeg", SearchOption.AllDirectories);
                 lbphTrainedImageCounter = 0;
                 foreach (var file in files)
                 {
                     Image<Bgr, Byte> TrainedImage = new Image<Bgr, Byte>(file);
                     if (eqHisChecked.Checked == true)
                     {
                         TrainedImage._EqualizeHist();
                     }
                     lbphTrainingImages.Add(TrainedImage.Convert<Gray, Byte>());
                     lbphlabels.Add(fileName(file));
                     lbphIntlabels.Add(lbphTrainedImageCounter);
                     lbphTrainedImageCounter++;
                     richTextBox1.Text += fileName(file) + "\n";
                 }
                 lbphFaceRecognizer = new LBPHFaceRecognizer(1, 8, 8, 8, 123.0);
                 lbphFaceRecognizer.Train(lbphTrainingImages.ToArray(), lbphIntlabels.ToArray());


             }
             catch (Exception ex)
             {
                 MessageBox.Show(ex.ToString());
                 MessageBox.Show("Nothing in binary database, please add at least a face(Simply train the prototype with the Add Face Button).", "Triained faces load", MessageBoxButtons.OK, MessageBoxIcon.Exclamation);
             }




         }



     }




User Interface


As a source camera, video, single image and multi image can be selected. Different source selection is developed using opencv and .net libraries.




4. Future of PassFace Project


Algorithm Accuracy Analysis
Program designed for using different algorithms. But the comparison of these algorithms is not finished. The next step is implementing this algorithm the compare the algorithms accuracy and performance truly.
Performance Optimizations
For fix the performance problems, develop the program for multiple core CPUs. For better performance, develop the program to run over GPU using CUDA libraries.
Algorithm Optimizations
In these days, the most improved face recognition algorithms are using 3D face recognition technologies. It is based on; the 3D model of the faces are created using different 2D images. Therefore, the angle of looking and the direction of light are no longer be problem for these algorithms. The recognize operations are implementing using Neural Networks and Deep Learning Algorithms. It is planning to implement latest algorithms to increase accuracy in different conditions.


5. Conclusion
In this project, the main principles of the face recognition algorithms are learned. The performances and accuracies of the algorithms are compared. A gui application is developed to create database and process images using selected algorithm.


References



1 yorum: