Robust Video-based Face Recognition.

Subashini.T; S.T.Munusamy; Srinivasan. R

doi:10.15680/ijircce.2015.0303086

Robust Video-based Face Recognition.

Subashini.T¹, S.T.Munusamy², Prof Srinivasan. R³

M.Tech (IT) Student, Department of IT, PSV College of Engg & Tech, Krishnagiri, TN, India
Assistant Professor, Department of IT, PSV College of Engg & Tech, Krishnagiri, TN, India
Head of Department , Department of IT, PSV College of Engg & Tech, Krishnagiri, TN, India

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering

Abstract

In recent years, multi-camera networks have become increasingly common for biometric and surveillance systems. Multi view face recognition has become an active research area in recent years. In this paper, an approach for video-based face recognition in camera networks is proposed. Traditional approaches estimate the pose of the face explicitly. A robust feature for multi-view recognition that is insensitive to pose variations is proposed in this project. The proposed feature is developed using the spherical harmonic representation of the face, texture mapped onto a sphere. The texture map for the whole face constructed by back-projecting the image intensity values from each of the views onto the surface of the spherical model. A particle filter is used to track the 3D location of the head using multi- view information. Videos provide an automatic and efficient way for feature extraction. Data redundancy renders the recognition algorithm more robust. The similarity between feature sets from different videos can be measured using the reproducing Kernel Hilbert space.

Keywords

multi camera networks, face recognition, spherical harmonic, particle filter, Kernel Hilbert space

I. INTRODUCTION

Face detection is the first stage of a face recognition system. A lot of research has been done in this area, most of which is efficient and effective for still images only & could not be applied to video sequences directly. In the video scenes, human faces can have unlimited orientations and positions, so its detection is of a variety of challenges to researchers [1][2]. In recent years, multi-camera networks have become increasingly common for biometric and surveillance systems. Multi view face recognition has become an active research area in recent years. In this paper, an approach for video-based face recognition in camera networks is proposed. Traditional approaches estimate the pose of the face explicitly. A robust feature for multi-view recognition that is insensitive to pose variations is proposed in this paper. The proposed feature is developed using the spherical harmonic representation of the face, texture mapped onto a sphere. The texture map for the whole face is constructed by back-projecting the image intensity values from each of the views onto the surface of the spherical model. A particle filter is used to track the 3D location of the head using multi-view information. Videos provide an automatic and efficient way for feature extraction. In particular, self-occlusion of facial features, as the pose varies, raises fundamental challenges to designing robust face recognition algorithms. A promising approach to handle pose variations and its inherent challenges is the use of multi-view data.

Face recognition in videos is an active topic in the field of image processing, computer vision and biometrics over many years. Compared with still face recognition videos contain more abundant information than a single image so video contain spatio-temporal information. To improve the accuracy of face recognition in videos to get more robust and stable recognition can be achieved by fusing information of multi frames and temporal information and multi poses of faces in videos make it possible to explore shape information of face and combined into the framework of face recognition. The video-based recognition has more advantages over the image-based recognition. First, the temporal information of faces can be utilized to facilitate the recognition task. Secondly, more effective representations, such as a 3D face model or super-resolution images, can be obtained from the video sequence and used to improve recognition results. Finally, video- based recognition allows learning or updating the subject model over time to improve recognition results for future frames. So video based face recognition is also a very challenging problem, which suffers from following nuisance factors such as low quality facial images, scale variations, illumination changes, pose variations, Motion blur, and occlusions and so on.

II.RELATED WORK

The term multi-view face recognition, in a strict sense, only refers to situations where multiple cameras acquire the subject (or scene) simultaneously and an algorithm collaboratively utilizes the acquired images/videos. But the term has frequently been used to recognize faces across pose variations. This ambiguity does not cause any problem for recognition with still images. A group of images simultaneously taken with multiple cameras and those taken with a single camera but at different view angles are equivalent as far as pose variations are concerned. However, in the case of video data, the two cases diverge. While a multi-camera system guarantees the acquisition of multi-view data at any moment, the chance of obtaining the equivalent data by using a single camera is unpredictable. Such differences become vital in non-cooperative recognition applications such as surveillance. With the prevalence of camera networks, multi - view surveillance videos have become more and more common. Most existing multiview video face recognition algorithms exploit single-view videos. The different methods for face recognition are given below:

A.Still image-based recognition:

This method will also require the poses and illumination conditions to be estimated for both face images. This ÃÂ¢Ãâ¬Ãâ¢generic reference setÃÂ¢Ãâ¬Ãâ idea has also been used to develop the holistic matching algorithm, where the ranking of look-up results forms the basis of matching measure. There are also works which handles pose variations implicitly without estimating the pose explicitly [3].

B. Video-based recognition:

Video contains more information than still images. A straightforward way to handle single view videos is to take advantage of the data redundancy and perform view selection. Then, for each of the candidates, a face detector specific to that pose is applied to determine if it is a face. Only the frontal faces are retained for recognition. The continuity of pose variation in video has inspired the idea of modelling face pose manifolds. The typical method is to cluster the frames of similar pose and train a linear subspace to represent each pose cluster. Here, the piecewise linear subspace model is an approximation to the pose manifold. The linearity is measured as the ratio of geodesic distance to Euclidean distance, and the distances are calculated between a candidate neighbour and each existing sample in the cluster. The 3D model can be then used in a model-based algorithm to perform face recognition [4].

C. Multi-view-based recognition:

In contrast to single view/video-based face recognition, there are relatively a smaller number of approaches for recognition using multi view videos. Frames of a multi-view sequence are collected together to form a gallery or probe set. The recognition algorithm is frame- based PCA and LDA fused by the sum rule. In, a three-layer hierarchical image- set matching technique is presented. The first layer associates frames of the same individual taken by the same camera. The second layer matches the groups obtained in the first layer among different cameras. Finally, the third layer compares the output of the second layer with the training set, which is manually clustered using multi -view videos. Though multi- view data is used to deal with occlusions when more than one subject is present, pose variations are no t effectively addressed in this work [5].

D. Video processing in multi-camera networks:

Camera networks have been extensively used for surveillance and security applications. Research in this field has been focused on distributed tracking, resource allocation, activity recognition and active sensing. They adapt the feature correspondence computations by modeling the long term dependencies between them and then obtain statistically optimal paths for each subject [6].

E. Spherical harmonics (SH) in machine vision:

To estimate the SH basis images for a face at a fixed pose from a single 2D image based on statistical learning.

When the 3D shape of the face is available, the SH basis images can be estimated for test images with different poses [7]. As a result, they require a 3D face model and face pose estimation to infer the face appearance. An SH -based feature to directly model face appearance rather than the reflectance function is used, and hence do not require a 3D face surface model or a pose estimation step.

III.PROPOSED WORK

For a given set of multi-view video sequences, first use a particle filter to track the 3D location of the head using multi- view information. At each time instant or video frame, build the texture map associated with the face under the spherical model for the face. Given that the 3D location of the head from the tracking algorithm, back-project the image intensity values from each of the views onto the surface of the spherical model, and construct a texture map for the whole f ace. Then compute a Spherical Harmonic (SH) transform of the texture map, and construct a robust feature that is based on the properties of the SH projection. For recognition with videos, the feature similarity is measured by the limiting Bhattacharyya distance of features in the Reproducing Kernel Hilbert Space.

The proposed approach outperforms traditional features and algorithms on a multi-view video database collected using a camera network. Building rotational tolerances into this feature completely bypasses the pose estimation step. The proposed approach of the Multi-view Face Recognition Algorithm is defined as follows.

Robust feature:

The robust feature is based on the theory of spherical harmonics. Spherical harmonics are a set of orthonormal basis functions defined over the unit sphere, and can be used to linearly expand any square- integral function on S 2 as:

Where Ylm(· , · ) defines the SH basis function of degree l ≥ 0 and order m ∈ (−l,−l +1, . . . , l −1, l). flm is the coefficient associated with the basis function Ylm for the function f .The SH basis function for degree l and order m has the following form:

The expansion coefficients have a very important property which is directly related to our ÃÂ¢Ãâ¬Ãâ¢pose freeÃÂ¢Ãâ¬Ãâ face recognition application. A robust multi-view tracking algorithm based on Sequential Importance Resampling (SIR) (particle filtering). Tracking is an essential stage in camera-network-based video processing. It automates the localization of the face and has direct impact on the performance of the recognition algorithm.

Multi-View Tracking: It is well known that higher the dimensionality of the state space is the harder the tracking problem becomes. This is especially true for search-algorithms like SIR since the number of particles typically grows dramatically for high-dimensional state spaces. However, given that our eventual recognition framework is built on the robust feature derived using SH representation under the diffuse lighting assumption, it suffices that we track only the location of the head in 3D. Hence, the state space for tracking s = (x, y, z) represents only the position of a sphere’s centre, disregarding any orientation information [8]

Histogram: A normalized 3D histogram in RGB space is built from this image region. Its difference with the template, which is set up at the first frame through the same procedure and subject to adaptive update thereafter, is measured by the Bhattacharyya distance. This defines the first cue matching function.

IV. RESULT

Dynamic changes of faces in videos

The temporal information in video sequences enables the analysis of facial dynamic changes and its application as a biometric identifier for person recognition. we have utilize the human nature that human will have at least small amount of movements such as eyes blinking and/or mouth and face boundary movements. We can get this information easily because d e a l i n g with video sequence by which the whole sequence of the object's movements can be obtained. Taking that point in to account we can reduce the error that occurs due to false detection of a human face and minimize the time of simulation.

Matta et al. proposed a multi-modal recognition system [31,32]. They successfully integrated the facial motion information with mouth motion and facial appearance by taking advantage of a unified probabilistic framework. In [33], Huang and Trivedi developed a face recognition system by employing HMMs for facial dynamic information modelling in videos. Each covariance matrix was gradually adapted from a global diagonal one by using its class-dependent data in training algorithms. Afterwards, Liu and Cheng [34] successfully applied HMMs for temporal video recognition (as illustrated in Fig. 4)by improving the basic implementation of Huang and Trivedi. Each test sequence was used to update the model parameters of the client in question by applying a maximum a posteriori (MAP) adaptation technique.

V. CONCLUSION AND FUTURE WORK

A multi-view face recognition algorithm does not require any pose estimation or model registration step. A multiview video tracking algorithm is presented to automate the feature acquisition in a camera network setting. The video -based recognition problem can be modelled as one of measuring ensemble similarities in Reproducing Kernel Hilbert Space (RKHS). The performance of this method can be demonstrated on a relatively uncontrolled multi-view video database.

Recent rapid progress of communication technology and computer science has made video-based face recognition acts as a vital role in human-machine interface and advanced communication. The main objective of this paper describes a survey of video-based face recognition modules & approaches. Still-to-Still, Video-to-Still based methods only exploit less and physiological information of the face but in Video-to-Video based methods have more and abundant information. In future video-based face recognition has made great challenge and to adopted in real application.

References

ShailJa A Patil, Paramod j Deore, “Video-based face recognition: a survey”, Proceedings of Conference on Advances in Communication and Computing (NCACC'12) Held at R.C.Patel Institute of Technology, Shirpur, Dist. Dhule,Maharastra,India.April 21, 2012.
Kanade, “Picture Processing by Computer Complex and Recognition of Human Faces”. Ph.D. thesis ,Kyoto University, 1973
Zhaoxiang Zhang, Chao Wang and Yunhong Wang, “Video-Based Face Recognition: State of the Art ,"
Lacey Best-Rowden , Brendan Klare , “Video-to-Video Face Matching: Establishing a Baseline for Unconstrained Face Recognition”, To appear in the Proc. IEEE BTAS, 2013.
Jeremiah r. Barr , kevin w. Bowyer , “Face recognition from video: a review”, International Journal of Pattern Recognition and Artificial Intelligence Vol. 26, No. 5 (2012) 1266002 (53 pages) 6 Yongzhong Lu, Jingli Zhou, Shengsheng Yu, ” a survey of face detection, extraction and recognition”, Computing and Informatics, Vol. 22, 2003.
G. Shakhnarovich, J. W. Fisher, and T. Darrell, “ Face recognition from long-term observations.”, In Proc. European Conf. on Computer Vision, volume 3, pp: 851-865, 2002.
S. Zhou and R. Chellappa, “Probabilistic human recognition from video,” in Proceedings of the European Conference on Computer Vision, pp. 681–697, Copenhagen, Denmark, 2002.
V. Kr¨ueger and S. Zhou.,”Exemplar-based face recognition from video.”, In Proc. European Conf. on Computer Vision, volume 4, pp: 732-746.
S. Zhou, V. Krueger, R. Chellappa, “Face recognition from video: A condensation approach ,” in IEEE Int. Conf. on Automatic Face and Gesture Recognition, 2002, pp. 221-228.
Abdenour Hadid; Matti Pietikinen; Combining appearance and motion for face and gender recognition from videos, Pattern Recognition, Vol:42(11),2009,pp: 2818-2827.
Chellappa, R. Kruger, V..Shaohua Zhou, Probabilistic recognition of human faces from video, 2002 International Conference on Image Processing, Vol 1, 2002, pp. 41-45.
Gregory Shakhnarovich, Baback Moghaddam Face Recognition in Subspaces, Handbook of Face Recognition, 2004.
N.Vaswani and R. Chellappa, “Principal components null space analysis for image and video classification,” IEEE Transactions on Image Processing, vol. 15, no. 7, pp. 1816–1830, 2006.
S. Soatto, G. Doretto, and Y. Wu, “Dynamic textures,” in Proceedings of the International Conference on Computer Vision, vol. 2, pp. 439– 446, Vancouver, Canada, 2001.
M. Kim, S. Kumar, V. Pavlovic, and H. Rowley, “Face tracking and recognition with visual constraints in real-world videos,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), June2008.
C. Shan, S. Gong, P. Mcowan, Learning gender from human gaits and faces ,IEEE International Conference on Advanced Video and Signal based Surveillance, 2007, pp:505-510.
Christian Micheloni , Sergio Canazza ,Gian Luca Foresti; Audio-video biometric recognition for non-collaborative access granting; Visual Languages and Computing,2009.
M. Balasubramanian , S. Palanivela, and V. Ramalingama; Real time face and mouth recognition using radial basis function neural networks; Expert Systems with Applications, Vol:36(3), pp: 6879-6888.
Georghiades, A., Kriegman, D., & Belhumeur, P. From few to many: Generative models for recognition under variable pose and illumination. IEEE Transactions Pattern Analysis and Machine Intelligence, 40,643-660. 2001.