ISSN ONLINE(2320-9801) PRINT (2320-9798)

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

An Overview on Facial Image Annotation

A.Nithya1 and K.Haridas2
  1. Research scholar, Department of Computer Science, NGM College, Pollachi, India
  2. Assistant Professor, Department of Computer Applications, NGM College, Pollachi, India
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering


Face recognition presents a challenging problem in the field of image analysis and computer vision, and it has received a great deal of attention over the last few years because of its applications in various domains. Mining web facial images on the internet has emerged as a promising paradigm towards auto face annotation. Content-based image retrieval (CBIR) systems require users to query images by their low-level visual content. This not only makes it hard for users to formulate queries, but also can lead to unsatisfied retrieval results. Because of this, Image annotation is introduced. The aim of image annotation is to automatically assign keywords to images, so image retrieval users are able to query images by Keywords and automatically detect human faces from a photo image and further name the faces with the corresponding human names. This paper also focuses on some of the earlier work on face annotation.


Face annotation, face recognition, SBFA, Text based search.


An extremely large number of image data such as satellite images, medical images and photographs are generated every day. With the rapid growth of web photo sharing portals and social networks, massive amounts of images and photos have been uploaded and shared on the internet. With the increasing proliferation of digital cameras, our daily life can be easily captured, stored, browsed, and shared. Moreover, with the great success of social networks and social web sites recently, web users have been highly motivated to share their images with friends and public that allows other users to tag and comment on their image collections.
Nowadays large amount of photos shared by users are human facial images and it is freely available in World Wide Web (WWW). Some of these facial images are tagged properly. Due to the significant increase of the amount of photos, a strong need has been emerged for automatic indexing. The most important and common entries for indexing personal photos are „who?, „where?, and „when? in that order. Since most people usually organize collection of photos on the basis of some particular persons of interest (e.g., photos including their friends) [13], finding „who? on personal photos is one of the most promising applications in MIR. Face annotation can applied for online photo sharing applications and video domains. The goal of an automated image tagging task is assigning with some pre-trained image models. One of the challenges is the need for tools that automatically analyze the visual content with semantically meaningful annotations.


Images were first annotated with text and then search using text based approach from traditional database management system. Text based image retrieval system uses traditional database techniques are used to managing images. Through text description, images can be organized by topical or semantic hierarchies to facility easy navigation and browsing base on standard Boolean queries Random Fields have also been studied to name all faces in an image in, e.g., Stone et al [19]. Each face is a node in the graph and a random field is solved either for each picture or for a group of pictures. However automatically, most text based image retrieval system requires manual annotation of images. Annotating images manually is a cumbersome as an expensive task for large image databases, and is often subjective, context sensitive and incomplete. as a result it is difficult for traditional text based method to support variety of task dependant queries. Learning correspondence between keywords and image regions Verbeek and Triggs [22], and learning image retrieval and auto-annotation with keywords Grangier et al [11].
Random Fields have also been studied to name all faces in an image in, Stone et al [19]. Each face is a node in the graph and a random field is solved either for each picture or for a group of pictures. Previous work that considers retrieving faces of specific people from caption-based supervision includes Ozkan and Duygulu [16], and ours Guillaumin et al, Mensink and Verbeek, [14]. These methods perform a text based query over the captions, returning the documents that have the queried name in the caption. The faces found in the corresponding images are then further visually analyzed. The assumption underlying these methods is that the returned documents contain a large group of highly similar faces of the queried person, and additional faces of many other people appearing each just a few times. Metric learning has received a lot of attention. For recent work in this area see, e.g., Bar-Hillel et al [2]. Importantly, scores for visual identification can also be applied for other problems such as visualization , recognition from a single example Fei-Fei et al [9], associating names and faces in images or video Everingham et al [8].


Biometric-based technologies include identification based on Physiological characteristics and behavioral traits [9].Face recognition appears to offer several advantages over other Biometric method, facial images can be easily obtained with a couple of inexpensive fixed cameras. Good face recognition algorithms and appropriate preprocessing of the images can compensate for noise and slight variations in orientation, scale and illumination.
Face recognition is used for two primary tasks:
a) Verification (one-to-one matching): When presented with a face image of an unknown individual along with a claim of identity, ascertaining whether the individual is who he/she claims to be.
b) Identification (one-to-many matching): Given an image of an unknown individual, determining that person?s identity by comparing (possibly after encoding) that image with a database of (possibly encoded) images of known individuals.
There are numerous application areas in which face recognition can be exploited for these two purposes, a few of which are outlined below.
• Security (access control to buildings, airports/seaports, ATM machines and border checkpoints [3, 4];
• Computer/ network security [5]; email authentication on multimedia workstations).
• Criminal justice systems (mug-shot/booking systems, post-event analysis, forensics).
• Image database investigations (searching image databases of licensed drivers, benefit recipients, missing children, immigrants and police bookings).
• Video indexing (labeling faces in video) [1].
In addition to these applications, the underlying techniques in the current face recognition technology have also been modified and used for related applications such as gender classification [3,17], expression recognition [18] and facial feature recognition and tracking [4]; each of these has its utility in various domains: Face recognition is also being used in conjunction with other biometrics such as speech, iris, fingerprint, ear and gait recognition in order to enhance the recognition performance of these methods [25].

Face recognition techniques

Face recognition techniques can be broadly divided into three categories: methods that operate on intensity images, those that deal with video sequences, and those that require other sensory data such as infra-red imagery.
a) Face Recognition from Intensity Images
Face recognition methods for intensity images fall into two main categories: Feature-based and Holistic [3]. Feature-based approaches first process the input image to identify and extract (and measure) distinctive facial features such as the eyes, mouth, nose, etc., and then compute the geometric relationships among those facial points, thus reducing the input facial image to a vector of geometric features and Holistic approaches attempt to identify faces using global representations, i.e., descriptions based on the entire image rather than on local features of the face.
b) Face Recognition from Video Sequences
A video-based face recognition system typically consists of three modules: one for detecting the face; a second one for tracking it; and a third one for recognizing it [17]. Most of these systems choose a few good frames and then apply one of the recognition techniques for intensity images to those frames in order to identify the individual [5].


Now a day?s search-based face annotation plays a vital role. Specifically, given a user-uploaded facial image for annotation, the search-based face annotation scheme firstly retrieves a short list of top-K most similar facial images from a large scale web facial image database, and then annotates the query facial image by mining the labels associated with the top-K similar facial images. In general, the search-based face annotation scheme has to tackle two main challenges [23].
a) There is a challenge in efficiently retrieving the top-K most similar facial images from a large facial image database given a query facial image.
b) There is a challenge in effectively exploit the shortlist of Candidate facial images and their weak labels for naming the Faces automatically.
The SBFA fig-1consists of following steps.
Step 1: Facial image data collection;
Step 2: Face detection and facial feature extraction;
Step 3: High-dimensional facial feature indexing;
Step 4: Learning to refine weakly labeled data;
Step 5: Similar face retrieval.


To index and retrieve personal photos based on an understanding of „who? is in the photos, annotation (or tagging) of faces is essential. However, manual face annotation by users is a time-consuming and inconsistent task that often imposes significant restrictions on exact browsing through personal photos containing their interesting persons. As an alternative, automatic face annotation solutions have been proposed. So far, conventional FR (Face Recognition) technologies have been used as main part to index people appearing in personal photos [13].FR techniques can take benefits to improve annotation accuracy by taking into account context information. In addition, in contrast to previous research in this field, method requires no training data labeled by hand from photos. From a practical point of view, it is highly desirable in most cases with a shortage of labeled data.
Three representative subspace FR methods are adopted as FR framework in the following: Principal Component Analysis (PCA or “eigenfaces”), Fisher-Linear Discriminate Analysis (FLDA or “fisher-faces), and Bayesian (“Probabilistic Eigen space”) [15]. Also, feature and measurement-level fusion strategies are used to efficiently take advantages of multiple facial features per person. In contrast to other FR based applications (e.g. Surveillance security and law enforcement), annotation of faces on personal photos can gain beneficial properties from time and space contextual information due to the following facts:
a) A sequence of photos taken in close proximity of time has relatively stationary visual context.
b) One would tend to take several pictures in a fixed place [17].


A machine learning scheme for mining social images, and its application to resolve a challenging task, automated image tagging, which is important and beneficial to many web and multimedia applications. The goal of an automated image tagging task is to assign a set of semantic labels or tags to a novel image with some pre-trained image recognition models. The traditional approach typically has two steps:
a) Representing images by extracting visual features
b) Pre-training recognition models by building classification models from a collection of manually-labeled training data [5].
In literature, numerous studies have been devoted to automated image annotation social images freely available on the web. The main idea of retrieval-based paradigm [24] is to first retrieve a set of k- most similar images for a test photo from the social image repository, and then to assign the test photo with a set of most relevant tags associated with the set of k retrieved social images.
As shown in the Fig-2 a similarity search is conducted to find a subset of top-k images from a social image database. When top-k similar images are obtained from the search process for similarity images and suggest the top relevant images.


A human name as the input query and mainly aim to refine the text-based search results by exploiting visual consistency of facial images.
Graph based approach
In the graph-based approach of Guillaumin et al. [14], Ozkan and Duygulu [16], faces are represented as nodes and edges encode the similarity between two faces. The assumption of that faces queried person occurs relatively frequent and highly similar, yield a search for the densest sub graph. A graph is defined as G = (V, E) where the vertices in V represent faces and edges in E are weighted according to similarity ωij between faces i and j. To filter our initial text-based results, we search for the densest sub graph S⊆ V, of G, where the density f(S) of S is given by equation

Caption based approach

This method performs a text-based query over the captions, returning the documents that havethe queried name in the caption. The faces found in the corresponding images are then further visually analyzed. The assumption underlying these methods is that the returned documents contain a large group of highly similar faces of the queried person, and additional faces of many other people appearing each just a few times. The goal is thus to find a single coherent compact cluster in a space that also contains many outliers. So they deploy the metric learning approach. It has received lot of attention. Metric learning is one of the methods that can provide robust similarity measures for the problem of face and, more generally, visual identification. Recently there has been considerable interest for such identification methods [7].


This paper presents Face Annotation is a challenging problem in the field of Image analysis and computer vision has received many applications in various domains. It also explains about face annotation techniques such as intensity images and video sequences. The main use of annotation is user can search easily interact with friends and famous persons. One of the approaches in annotation is Search based annotation which is used to interact easily with social networks. The facial annotation method explains efficient automated image tagging and text based retrieval for visual identifications.

Figures at a glance

Figure 1 Figure 2
Figure 1 Figure 2