ISSN ONLINE(2319-8753)PRINT(2347-6710)
Mitul Modi1, Fedrik Macwan2 PG Scholars, Dept. of Electrical, The M S University, Baroda, Gujarat, India |
Related article at Pubmed, Scholar Google |
Visit for more related articles at International Journal of Innovative Research in Science, Engineering and Technology
Now a day‘s face plays major role in social intercourse for conveying identity and feelings of person. Persons have marvellous ability to identify different faces than machines. So face detection plays major role in face recognition, facial expression recognition, head-pose estimation, human-computer interaction, etc. Face detection is a computer technology that determines the location and size of human face in arbitrary (digital) image. This paper presents all comprehension and critical survey of algorithms through which face detection is possible.This document gives formatting instructions for authors preparing papers for publication in the Proceedings of an International Journal. The authors must follow the instructions given in the document for the papers to be published. You can use this document as both an instruction set and as a template into which you can type your own text.
Keywords |
Digital image processing, face detection, localizing faces. |
INTRODUCTION |
Early efforts in face detection have dated back as early as the beginning of the 1970s, where simple heuristic and anthropometric techniques [1] were used. This methods are basically reliable on different hypothesis such as frontal face, fixed or plain background, Passport size photograph scenarios. If any changes are occur in conditions, faces in image are not detected. At the beginning of 1990‘s [2] techniques are proposed focused on the face recognition and video coding systems on and increase the need of face detection. More robust segmentation schemes have been presented, particularly those using motion, color, and generalized information. The use of statistics and neural networks has also enabled faces to be detected from cluttered scenes at different distances from the camera. |
The concept of face detection can be implemented in various ways but mainly we use two steps for this implementation. First step is to localize the face that means we are enlightening those parts of an image where a face is present. And Last step is to verify whether the enlightening parts are carrying out a face or not [3]. The concept illustrated above may seemed very simple but when we implement it, we may go through some difficulties [4, 5] like scale, rotation, pose, expression, presence or absence of some structural component, occlusion, illumination variation and image condition. This document is a template. An electronic copy can bedownloaded from the conference website. For questions onpaper guidelines, please contact the conference publicationscommittee as indicated on the conference website. Information about final paper submission is available from the conference website. |
MODERN FACE DETECTION TECHNIQUES |
Face detection is a computer technology that determines the location and size of human face in arbitrary (digital) image. The facial features are detected and any other objects like trees, buildings and bodies etc are ignored from the digital image. It can be regarded as a âÃâ¬Ãâspecific‘ case of object-class detection, where the task is finding the location and sizes of all objects in an image that belong to a given class. Face detection, can be regarded as a more âÃâ¬Ãâgeneral‘ case of face localization. In face localization, the task is to find the locations and sizes of a known number of faces (usually one). Basically there are two types of approaches to detect facial part in the given image i.e. feature base and image base approach.Feature base approach tries to extract features of the image and match it against the knowledge of the face features. While image base approach tries to get best match between training and testing images. |
A. Feature base Approach |
1) Active Shape Model |
Active shape models focus on complex non-rigid features like actual physical and higher level appearance of features [6]. Means that Active Shape Models (ASMs) are aimed at automatically locating landmark points that define the shape of any statistically modelled object in an image. When of facial features such as the eyes, lips, nose, mouth and eyebrows. The training stage of an ASM involves the building of a statistical facial model from a training set containing images with manually annotated landmarks. ASMs is classified into three groups i.e. snakes, PDM, Deformable templates. |
1.1) Snakes: |
The first type uses a generic active contour called snakes, first introduced by Kass et al. in 1987 [7]. Snakes are used to identify head boundaries [8,9,10,11,12]. In order to achieve the task, a snake is first initialized at the proximity around a head boundary. It then locks onto nearby edges and subsequently assume the shape of the head. The evolution of a snake is achieved by minimizing an energy function, Esnake (analogy with physical systems), denoted as |
Esnake = Einternal + EExternal WhereEinternal and EExternal are internal and external energy functions. |
Internal energy is the part that depends on the intrinsic properties of the snake and defines its natural evolution. The typical natural evolution in snakes is shrinking or expanding. The external energy counteracts the internal energy and enables the contours to deviate from the natural evolution and eventually assume the shape of nearby features—the head boundary at a state of equilibria. |
Two main consideration for forming snakes i.e. selection of energy terms and energy minimization. Elastic energy [8, 9, 11, 12] is used commonly as internal energy. Internal energy is vary with the distance between control points on the snake, through which we get contour an elastic-band characteristic that causes it to shrink or expand. On other side external energy relay on image features. Energy minimization process is done by optimization techniques such as the steepest gradient descent. Which needs highest computations. Huang and Chen [9] and Lam and Yan [13] both employ fast iteration methods by greedy algorithms. Snakes have some demerits like contour often becomes trapped onto false image features and another one is that snakes are not suitable in extracting non convex features. |
1.2) Deformable Templates: |
Deformable templates were then introduced by Yuille et al. [14] to take into account the a priori of facial features and to better the performance of snakes. Locating a facial feature boundary is not an easy task because the local evidence of facial edges is difficult to organize into a sensible global entity using generic contours. The low brightness contrast around some of these features also makes the edge detection process problematic. Yuille et al. [14] took the concept of snakes a step further by incorporating global information of the eye to improve the reliability of the extraction process. Deformable templates approaches are developed to solve this problem. Deformation is based on local valley, edge, peak, and brightness [15]. Other than face boundary, salient feature (eyes, nose, mouth and eyebrows) extraction is a great challenge of face recognition. |
E = Ev + Ee + Ep + Ei + Einternal ; where Ev , Ee , Ep , Ei , Einternal are external energy due to valley, edges, peak and image brightness and internal energy. |
1.3) PDM (Point Distribution Model) |
Independently of computerized image analysis, and before ASMs were developed, researchersdeveloped statistical models of shape [30]. The idea is that once you represent shapes asvectors, you can apply standard statistical methods to them just like any other multivariateobject. These models learn allowable constellations of shape points from training examplesand use principal components to build what is called a Point Distribution Model. These havebeen used in diverse ways, for example for categorizing Iron Age broaches [18]. |
Ideal Point Distribution Models can only deform in ways that are characteristic of the object. Cootes and his colleagues were seeking models which do exactly that so if a beard, say, covers the chin, the shape model can \override the image" to approximate the position of the chin under the beard. It was therefore natural (but perhaps only in retrospect) to adopt Point Distribution Models. This synthesis of ideas from image processing and statistical shape modelling led to the Active Shape Model.The first parametric statistical shape model for image analysis based on principal components of inter-landmark distances was presented by Cootes and Taylor in [19]. On this approach, Cootes, Taylor, and their colleagues, then released a series of papers that cumulated in what we call the classical Active Shape Model [20 - 24]. |
2) Low Level Analysis |
Based on low level visual features like color, intensity, edges, motion etc. |
2.1) Skin Color Base |
Color is avital feature of human faces. Using skin-color as a feature for tracking a face has several advantages. Color processing is much faster than processing other facial features. Under certain lighting conditions, color is orientation invariant. This property makes motion estimation much easier because only a translation model is needed for motion estimation [25]. Tracking human faces using color as a feature has several problems like the color representation of a face obtained by a camera is influenced by many factors (ambient light, object movement, etc.). |
Majorly three different face detection algorithms are available based on RGB, YCbCr, and HIS color space models.In the implementation of the algorithms there are three main steps viz. |
(1) Classify the skin region in the color space, |
(2) Apply threshold to mask the skin region and |
(3) Draw bounding box to extract the face image. |
Crowley and Coutaz [26] suggested simplest skin color algorithms for detecting skin pixels. The perceived human color varies as a function of the relative direction to the illumination. The pixels for skin region can be detected using a normalized color histogram, and can be normalized for changes in intensity on dividing by luminance. Converted an [R, G, B] vector is converted into an [r, g] vector of normalized color which provides a fast means of skin detection. This algorithm fails when there are some more skin region like legs, arms, etc. |
Cahi and Ngan [27] suggested skin color classification algorithm with YCbCr color space.Research found that pixels belonging to skin region having similar Cb and Cr values. So that the thresholds be chosen as [Cr1, Cr2] and [Cb1, Cb2], a pixel is classified to have skin tone if the values [Cr, Cb] fall within the thresholds. The skin color distribution gives the face portion in the color image. This algorithm is also having the constraint that the image should be having only face as the skin region. Kjeldson and Kender defined a color predicatein HSV color space to separate skin regionsfrom background [28]. Skin color classification inHSI color space is the same as YCbCr color spacebut here the responsible values are hue (H) andsaturation (S). Similar to above the threshold be chosen as [H1, S1] and [H2, S2], and a pixel isclassified to have skin tone if the values [H,S] fallwithin the threshold and this distribution gives thelocalized face image. Similar to above twoalgorithm this algorithm is also having the same constraint. |
2.2) Motion Base |
When useof video sequence is available, motion informationcan be used to locate moving objects. Movingsilhouettes like face and body parts can be extractedby simply thresholding accumulated framedifferences [29]. Besides face regions, facial featurescan be located by frame differences [30, 31]. |
2.3) Gray Scale Base |
Gray information within a face canalso be treat as important features. Facial features such as eyebrows, pupils, and lips appear generallydarker than their surrounding facial regions. Various recent feature extraction algorithms [32 – 34] searchfor local gray minima within segmented facial regions. In these algorithms, the input imagesare first enhanced by contrast-stretching and gray-scale morphological routines to improvethe quality of local dark patches and thereby make detection easier. The extraction of darkpatches is achieved by low-level gray-scale thresholding. Based method and consist three levels. Yang and huang [35] presented new approach i.e. faces gray scale behaviour in pyramid (mosaic) images. This system utilizes hierarchical Face location consist three levels. Higher two level based on mosaic images at different resolution. In the lower level, edge detection method is proposed. Moreover this algorithms gives fine response in complex background where size of the face is unknown. |
2.4) Edge Base |
Face detection based on edges was introduced by Sakai et al. [36]. This workwas based on analysing line drawings of the faces from photographs, aiming to locate facialfeatures. Than later Craw et al. [37] proposed a hierarchical framework based on Sakai et al.‘swork to trace a human head outline. Then after remarkable works were carried out by many researchers in this specific area. Method suggested by Anila and Devarajan [38] was very simple and fast. They proposed frame work which consist three stepsi.e. initially the images are enhanced by applying median filterfor noise removal and histogram equalization for contrast adjustment. In the second step the edge imageis constructed from the enhanced image by applying sobel operator. Then a novel edge trackingalgorithm is applied to extract the sub windows from the enhanced image based on edges. Further they used Back propagation Neural Network (BPN) algorithm to classify the sub-window as either face or non-face. |
3) Feature Analysis |
These algorithms aimto find structural features that exist even when thepose, viewpoint, or lighting conditions vary, andthen use these to locate faces. These methods aredesigned mainly for face localization. |
3.1) Feature Searching |
3.1.1) Viola Jones Method |
Paul Viola and Michael Jones presented an approach for object detection which minimizes computation time while achieving high detection accuracy. Paul Viola and Michael Jones [39] proposed a fast and robust method for face detection which is 15 times quicker than any technique at the time of release with 95% accuracy at around 17 fps.The technique relies on the use of simple Haar-like features that are evaluated quickly through the use of a new image representation. Based on the concept of an âÃâ¬Ãâ¢Integral ImageâÃâ¬Ãâ it generates a large set of features and uses the boosting algorithm AdaBoost to reduce the overcomplete set and the introduction of a degenerative tree of the boosted classifiers provides for robust and fast interferences. The detector is applied in a scanning fashion and used on gray-scale images, the scanned window that is applied can also be scaled, as well as the features evaluated. |
3.1.2) Gabor Feature Method |
Sharif et al. [39] proposed an Elastic Bunch Graph Map (EBGM) algorithmthat successfullyimplements face detection using Gabor filters. The proposedsystem applies 40 different Gabor filters on an image. As aresult of which 40 images with different angles and orientationare received. Next, maximum intensity points in each filteredimage are calculated and mark them as fiducial points. Thesystem reduces these points in accordance to distance betweenthem. The next step is calculating the distances between thereduced points using distance formula. At last, the distances arecompared with database. If match occurs, it means that thefaces in the image are detected. Equation of Gabor filter [40] is shown below |
` |
gives the orientation, |
3.2) Constellation Method |
All methods discussed so far are able to track faces but still some issue like locating faces of various poses in complex background is truly difficult. To reduce this difficultyinvestigator form a group of facial features in face-like constellations using more robust modellingapproaches such as statistical analysis. Various types of face constellations have been proposed by Burl et al. [41]. They establish use of statistical shape theory on the features detected from a multiscale Gaussian derivative filter. Huang et al. [42] also apply a Gaussian filter for pre-processing in a framework based on image feature analysis. |
B. Image Base Approach |
1) Neural Network |
Neural networks gaining much more attention in many pattern recognition problems, such as OCR, object recognition, and autonomous robot driving. Since face detection can be treated as a two class pattern recognition problem, various neural network algorithms have been proposed. The advantage of using neural networks for face detection is the feasibility of training a system to capture the complex class conditional density of face patterns. However, one demerit is that the network architecture has to be extensively tuned (number of layers, number of nodes, learning rates, etc.) to get exceptional performance. In early days most hierarchical neural network was proposed by Agui et al. [43]. The first stage having twoparallel subnetworks in which the inputs are filtered intensity valuesfrom an original image. The inputs to the second stagenetwork consist of the outputs from the sub networks andextracted feature values. An output at thesecond stage shows the presence of a face in the inputregion.Propp and Samal developed one of the earliest neuralnetworks for face detection [44]. Their network consists offour layers with 1,024 input units, 256 units in the first hiddenlayer, eight units in the second hidden layer, and two outputunits. |
Feraud and Bernier presented a detection method using auto associative neural networks [45], [46], [47]. The idea is based on [48] which shows an auto associative network with five layers is able to perform a nonlinear principal component analysis. One auto associative network is used to detect frontal-view faces and another one is used to detect faces turned up to 60 degrees to the left and right of the frontal view. After that Lin et al. presented a face detection system using probabilistic decision-based neural network (PDBNN) [49]. The architecture of PDBNN is similar to a radial basis function (RBF) network with modified learning rules and probabilistic interpretation. |
2) Linear Sub Space Method |
2.1) Eigen faces Method |
An early example of employing eigenvectors in facerecognition was done by Kohonen [50] in which a simpleneural network is demonstrated to perform face recognitionfor aligned and normalized face images. Kirby and Sirovich suggested that images of faces canbe linearly encoded using a modest number of basis images[51]. The idea is arguably proposed first byPearson in 1901 [52] and then by Hotelling in 1933 [53].Given a collection of n by m pixel training imagesrepresented as a vector of size m X n, basis vectors spanningan optimal subspace are determined such that the meansquare error between the projection of the training imagesonto this subspace and the original images is minimized.They call the set of optimal basis vectors Eigenpictures sincethese are simply the eigenvectors of the covariance matrixcomputed from the vectorized face images in the training set.Experiments with a set of 100 images show that a face imageof 91 X 50 pixels can be effectively encoded using only50 Eigenpictures, while retaining a reasonable likeness (i.e.,capturing 95 percent of the variance). |
3) Statistical Approach |
3.1) Support Vector Machine (SVM) |
SVMs were first introduced Osuna et al. [54]for face detection. SVMs work as a new paradigm to train polynomial function, neural networks, or radial basis function (RBF) classifiers.SVMs works on induction principle, called structural risk minimization, which targets to minimize an upper bound on the expected generalization error. An SVM classifier is a linear classifier where the separating hyper plane is chosen to minimize the expected classification error of the unseen test patterns.In [54], Osunaet al. developed an efficient method to train anSVMfor large scale problems,andapplied it to face detection. Basedon two test sets of 10,000,000 test patterns of 19 X 19 pixels, their system has slightly lower error rates and runs approximately30 times faster than the system by Sung and Poggio [55]. SVMs have also been used to detect faces and pedestrians in the wavelet domain. |
APPLICATIONS |
Face detection technology can be useful and necessary in a wide range of applications. Such as |
o Biometric identification |
o Video Conferencing |
o Human – Computer Interaction |
o Access control Systems |
CONCLUSION |
Face detection is currently a very active research area and the technology has come a long way. The last couple of years have shown great advances in algorithms dealing with complex environments such as low quality gray-scale images and cluttered backgrounds. Some of the best algorithms are still too computationally expensive to be applicable or real-time processing, but this is likely to change with coming improvements in computer hardware. This paper presents various numerous feature based and image based techniques that are available to detect human face. All methods have their own merits and demerits. Moreover using image based methods like neural network, SVM, PCA, uncertainty of the features in feature based approaches can be easily resolved. |
References |
|