Face recognition is a hot issue that has been widely studied. A large number of research papers have emerged one after another, and to a certain extent, it is suspected that it has become a "disaster". In order to better introduce the history and current situation of face recognition research, this article roughly divides the research history of AFR into three time stages according to the characteristics of research content, technical methods and other aspects, as shown in Table 1. This table summarizes the brief history of the development of face recognition research, its representative research work at each historical stage, and its technical characteristics. The following is a brief introduction to the research progress in the three stages:
The first stage (1964~1990)
At this stage, face recognition is usually only used as a general To study the pattern recognition problem, the main technical solution adopted is the method based on the geometric structure features of the face (Geometricfeature based). This is mainly reflected in people's research on silhouette (Profile). People have conducted a lot of research on the extraction and analysis of structural features of facial silhouette curves. Artificial neural networks have also been used by researchers in face recognition problems. In addition to Bledsoe, researchers who engaged in AFR research earlier include Goldstein, Harmon, Kanade Takeo, etc. Takeo Kanede completed his first doctoral thesis on AFR at Kyoto University in 1973. Until now, as a professor at the Robotics Research Institute of Carnegie Mellon University (CMU), he is still active in the field of face recognition. One of the characters. His research group is also an important force in the field of face recognition. Generally speaking, this stage is the initial stage of face recognition research. There are not many very important results, and there are basically no practical applications.
The second phase (1991~1997)
Although this phase is relatively short-lived, it is the climax of face recognition research and can be said to be fruitful: not only was the There are several representative face recognition algorithms. The U.S. military also organized the famous FERET face recognition algorithm test, and several commercially operated face recognition systems have emerged, such as the most famous FaceIt system of Visionics (now Identix) .
The "eigenface" method proposed by Turk and Pentland of the MIT Media Laboratory is undoubtedly the most famous person in this period. Face recognition methods. Many subsequent face recognition technologies are more or less related to eigenfaces. Now eigenfaces have become the benchmark algorithm for performance testing of face recognition together with the normalized cocorrelation method.
Another important work during this period was a comparative experiment done by Brunelli and Poggio of the MIT Artificial Intelligence Laboratory around 1992. They compared The recognition performance of the structural feature-based method and the template matching-based method is analyzed, and a relatively certain conclusion is given: the template matching method is better than the feature-based method. This guiding conclusion has the same effect as eigenfaces, basically suspending the research on face recognition methods purely based on structural features, and to a large extent promoting the appearance-based linear subspace The development of modeling and face recognition methods based on statistical pattern recognition technology has gradually become the mainstream face recognition technology.
The Fisherface face recognition method proposed by Belhumeur et al. is another important achievement during this period. This method first uses principal component analysis (PCA, also known as eigenface) to reduce the dimensionality of the apparent features of the image. On this basis, the linear discriminant analysis (LDA) method is used to transform the principal components after dimensionality reduction in order to obtain "the largest possible inter-class divergence and the smallest possible intra-class divergence". This method is still one of the mainstream face recognition methods, and has produced many different variants, such as the null space method, subspace discriminant model, enhanced discriminant model, direct LDA discriminant method, and some recent improved strategies based on kernel learning. .
Moghaddam of MIT proposed a face recognition method based on Bayesian probability estimation based on dual subspaces based on eigenfaces. This method uses the "difference method" to convert the similarity calculation problem of two face image pairs into a two-class (intra-class difference and inter-class difference) classification problem. Both intra-class difference and inter-class difference data must first pass Principal component analysis (PCA) technology performs dimensionality reduction, calculates the class conditional probability density of two categories, and finally uses Bayesian decision-making (maximum likelihood or maximum posterior probability) method to perform face recognition.
Another important method in face recognition - Elastic Graph Matching (EGM) was also proposed at this stage.
The basic idea is to use an attribute graph to describe a human face: the vertices of the attribute graph represent key feature points of the face, and their attributes are multi-resolution, multi-directional local features at the corresponding feature points - Gabor transform [12] features, which are called Jet; the edge attributes are the geometric relationships between different feature points. For any input face image, elastic graph matching uses an optimized search strategy to locate a number of predefined key facial feature points, and at the same time extracts their Jet features to obtain the attribute map of the input image. Finally, the recognition process is completed by calculating the similarity with the known face attribute map. The advantage of this method is that it not only retains the global structural features of the face, but also models the key local features of the face. Several extensions to this approach have also appeared recently.
The local feature analysis technology was proposed by Atick et al. of Rockefeller University. LFA is essentially a low-dimensional object description method based on statistics. Compared with PCA, which can only extract global features and cannot retain local topological structures, LFA extracts features based on global PCA description that are local and can At the same time, the global topological information is retained, thus having better description and discrimination capabilities. LFA technology has been commercialized into the famous FaceIt system, so no new academic progress was published in the later period.
The FERET project funded by the U.S. Department of Defense’s Counternarcotics Technology Development Program Office is undoubtedly a crucial event in this stage. The goal of the FERET project is to develop AFR technology that can be used by security, intelligence and law enforcement agencies. The project includes three parts: funding several facial recognition studies, creating a FERET facial image database, and organizing FERET facial recognition performance evaluation. The project organized three face recognition evaluations in 1994, 1995 and 1996. Several of the most well-known face recognition algorithms participated in the test, which greatly promoted the improvement and practicalization of these algorithms. Another important contribution of this test is to provide the further development direction of face recognition: face recognition problems under non-ideal acquisition conditions such as lighting and posture have gradually become a hot research direction.
Flexible Models - including Active Shape Model (ASM) and Active Appearance Model (AAM) are an important contribution to face modeling during this period. ASM/AAM describes the human face as two separate parts: 2D shape and texture, which are modeled using statistical methods (PCA) respectively, and then further integrate the two through PCA to statistically model the human face. The flexible model has good face synthesis capabilities and can use synthesis-based image analysis technology to extract features and model face images. Flexible models have been widely used in facial feature alignment (FaceAlignment) and recognition, and many improved models have emerged.
Overall, face recognition technology at this stage has developed very rapidly. The proposed algorithm has achieved very good performance under ideal image collection conditions, object cooperation, and small and medium-sized frontal face databases. , and as a result, several well-known facial recognition commercial companies have emerged. From the perspective of technical solutions, 2D face image linear subspace discriminant analysis, statistical appearance models, and statistical pattern recognition methods are the mainstream technologies at this stage.
The third stage (1998~present)
The evaluation of FERET'96 face recognition algorithm shows that the mainstream face recognition technology has problems with lighting, posture, etc. due to non-ideal acquisition conditions or The robustness to changes caused by objects not cooperating is relatively poor. Therefore, illumination and posture issues have gradually become research hotspots. At the same time, commercial systems for facial recognition have developed further. To this end, the U.S. military organized two commercial system reviews in 2000 and 2002 based on the FERET test.
The multi-pose and multi-illumination face recognition method based on the Illumination Cones model proposed by Georghiades et al. is one of the important achievements of this period. They proved An important conclusion is reached: all images of the same face from the same perspective and under different lighting conditions form a convex cone in the image space—that is, the illumination cone. In order to calculate the illumination cone from a small number of face images with unknown lighting conditions, they also extended the traditional photometric stereo vision method to calculate the light cone based on the unknown lighting conditions under the assumptions of Lambert model, convex surface and far point light source. 7 images from the same viewpoint restore the 3D shape of the object and the surface reflection coefficient of the surface point (traditional photometric stereo vision can restore the normal vector direction of the object surface based on the given 3 images of known lighting conditions), so that the object can be easily synthesized Images under any lighting conditions from a viewing angle can be used to complete the calculation of the lighting cone. Recognition is accomplished by calculating the distance from the input image to each illumination cone.
Statistical learning theory represented by support vector machines was also applied to face recognition and confirmation during this period. Support vector machine is a two-class classifier, while face recognition is a multi-class problem. There are usually three strategies to solve this problem, namely: intra-class difference/inter-class difference method, one-to-rest method and one-to-one method.
The multi-pose, multi-illumination face image analysis and recognition method based on the 3D Morphable Model model proposed by Blanz and Vetter et al. is a new method at this stage. A groundbreaking work. This method is essentially a synthesis-based analysis technology. Its main contribution is that it is based on the 3D shape and texture statistical deformation model (similar to the AAM in 2D), and also uses graphics simulation methods to improve the image acquisition process. Perspective projection and lighting model parameters are used for modeling, so that the internal attributes of the face such as face shape and texture can be completely separated from the external parameters such as camera configuration and lighting conditions, which is more conducive to the analysis and recognition of face images. Blanz's experiments show that this method achieves a very high recognition rate on both the CMU-PIE (Multiple Pose, Illumination and Expression) face database and the FERET multi-pose face database, proving the effectiveness of the method.
At the 2001 International Conference on Computer Vision (ICCV), Compaq Research Institute researchers Viola and Jones demonstrated their real-time human vision algorithm based on simple rectangular features and AdaBoost. The face detection system can detect quasi-frontal faces in CIF format at a speed of more than 15 frames per second. The main contributions of this method include: 1) using simple rectangular features that can be quickly calculated as face image features; 2) a learning method that combines a large number of weak classifiers to form a strong classifier based on AdaBoost; 3) using Cascade ) technology improves detection speed. At present, this face/non-face learning strategy has been able to achieve quasi-real-time multi-pose face detection and tracking. This provides a good foundation for face recognition on the back end.
Shashua et al. proposed a face image recognition and rendering technology based on quotient images [13] in 2001. This technology is a rendering technology based on learning from specific object class image sets. It can synthesize synthetic images of any input face image under various lighting conditions based on a small number of images with different lighting conditions in the training set. Based on this, Shasuha et al. also gave a definition of face signature (Signature) images that are invariant to various illumination conditions, which can be used for face recognition under invariant illumination, and experiments have shown its effectiveness.
Basri and Jacobs analytically proved an important conclusion by using spherical harmonics to represent illumination and a convolution process to describe Lambert reflection. : The set of all Lambert reflection functions obtained from any far point light source forms a linear subspace. This means that the collection of images of a convex Lambert surface object under various lighting conditions can be approximated by a low-dimensional linear subspace. This is not only consistent with the empirical experimental results of previous illumination statistical modeling methods, but also further theoretically promotes the development of linear subspace object recognition methods. Moreover, this makes it possible to use convex optimization methods to force the lighting function to be non-negative, providing an important idea for solving lighting problems.
After the FERET project, several facial recognition commercial systems emerged. Relevant departments of the U.S. Department of Defense have further organized FRVT, an evaluation of facial recognition commercial systems, which has been held twice so far: FRVT2000 and FRVT2002. On the one hand, these two tests compare the performance of well-known face recognition systems. For example, the FRVT2002 test shows that the three commercial products of Cognitec, Identix and Eyematic are far ahead of other systems, and the difference between them is not big. On the other hand, it comprehensively summarizes the current status of the development of face recognition technology: under ideal conditions (front-face visa photos), the highest preferred recognition rate of face recognition (Identification) for 121,589 images of 37,437 people is 73%, and face verification ( The equal error rate (EER [14]) of Verification is about 6%. Another important contribution of the FRVT test is that it also further pointed out several problems that need to be solved urgently with the current face recognition algorithm. For example, the FRVT2002 test shows that the performance of current face recognition commercial systems is still very sensitive to indoor and outdoor lighting changes, posture, time span and other changing conditions. Effective recognition problems on large-scale face databases are also serious. These problems are Further efforts are still needed.
Generally speaking, face recognition problems under non-ideal imaging conditions (especially lighting and posture), uncooperative objects, and large-scale face databases have gradually become a hot research issue. Nonlinear modeling methods, statistical learning theory, learning technology based on Boosting [15], and 3D model-based face modeling and recognition methods have gradually become technology development trends that have attracted much attention.
All in all, face recognition is a research topic that has both scientific research value and broad application prospects. Decades of research by a large number of international researchers have yielded fruitful results, and automatic face recognition technology has been successfully applied under certain limited conditions. These results have deepened our understanding of the problem of automatic face recognition, especially our understanding of its challenges.
Although existing automatic face recognition systems may have surpassed humans in terms of speed and even accuracy in comparing massive face data, for general face recognition problems under complex changing conditions, the robustness and accuracy of automatic face recognition systems The speed is far less than that of humans. The essential reason for this gap is still unknown. After all, our understanding of the human visual system is still very superficial. But judging from the perspective of disciplines such as pattern recognition and computer vision, this may mean that we have not yet found an effective sensor for reasonably sampling facial information (considering the difference between a monocular camera and a human binocular system). It may also mean that we have adopted Inappropriate face modeling methods (problems with the internal representation of faces) may also mean that we do not realize the ultimate accuracy that automatic face recognition technology can achieve. But in any case, giving computing devices facial recognition capabilities similar to humans is the dream of many researchers in this field. I believe that as research continues to deepen, our understanding should be able to more accurately approximate the correct answers to these questions.