Natural Language Querying for
Content Based Image Retrieval System

Sreena P. H.; David Solomon George

Natural Language Querying for Content Based Image Retrieval System

Sreena P. H.¹, David Solomon George²

M.Tech Student, Department of ECE, Rajiv Gandhi Institute of Technology, Kottayam, India
Asst. Prof, Department of ECE, Rajiv Gandhi Institute of Technology, Kottayam, India

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Abstract

Content based image retrieval (CBIR) system is a database management system for retrieval of images based on the similarity of image content with the query image. The issue of semantic gap causes retrieval of irrelevant images from database. In the proposed CBIR system, fuzzy clustering of the Tamura texture features extracted from the database is used to overcome this problem. Natural language querying is implemented by use of fuzzy data space. The fuzzy membership values derived using fuzzy c-means clustering is used as similarity measure. The proposed technique is implemented in Matlab and its effectiveness is verified using the standard Brodatz texture database.

Keywords

Content Based Image Retrieval System, Fuzzy c-means clustering, Natural language query, Semantic retrieval

INTRODUCTION

With increased accessibility to internet and low cost digital storage devices, large amount of digital information is available to common man. Taking the example of images, this large volume of data is used in fields of education, entertainment, commerce, biomedicine, and crime prevention. The highly evolved, and much sophisticated visual system help human beings to select the required image from a large image database within a few minutes or less. But it is hard to teach a machine the interpretation of what we see. Yet, tremendous efforts have been made over the past few decades to make the machine understand, index and annotate the pictures over wide range with much progress. Image retrieval systems are such image database management systems that try to understand the image before presenting it to user.

Different image retrieval techniques are in use [1]. They are categorized on the basis of nature of query like keyword based, free text based, image content based, or composite of these. Newer techniques, which use user interaction, help to reduce the semantic gap between the retrieved images and user query [2]. Early image retrieval techniques based on textual queries became inadequate as a result of advances in internet and digital image storage technologies [3], [4]. The systems that use the properties inherent to images would be able to handle large image databases. The CBIR systems were developed on this idea. A few commercial and experiment prototypes CBIR systems such as QBIC [5], Photobook [6], Virage [7], Netra [8], VisualSEEK [9], SIMPLIcity [10] have been developed.

Fig. 1 gives general overview of a CBIR system. The first step in developing a CBIR system is feature database creation. The feature database should contain feature vector corresponding to each image in the database. A feature vector is selected depending on the nature of database and application of the system. Commonly used content based features or the low level features are color, texture and shape. Features can be classified as general features- color, texture and shape and domain specific feature- face, fingerprint, etc, as they depend on application. As image perception differs for different person, there is no single best feature representation.

On presenting a query image to the system, its feature vector is extracted and similarity/distance with all feature vectors in database is measured. Many similarity measures have been developed for image retrieval based on empirical estimates of the distribution of features in recent years. Different similarity/distance measures will affect retrieval performances of an image retrieval system significantly. Various similarity measures like Minkowski-Form distance, Quadratic Form distance, Mahalanobis distance etc.are found in literature [11].Images with largest similarity or shortest distance are made available to the user after sorting.

The main problem faced by CBIR systems is twofold. One is the semantic gap problem which is the lack of coincidence between the image representation and the human interpretation for an image. The low level feature bears no information about the semantic content of the object being retrieved. Other is the perception subjectivity. Different users, or even the same user under different circumstances, may interpret an image differently. Moreover, the way users define the similarity between two images may be quite different. The perception subjectivity problem exists at each semantic level and depends on individual experience.

Recent works in CBIR mainly aims at reducing the semantic gap between the query and retrieved images. Common methods used to address the problem of semantic gap are: using object ontology to define high-level concepts, using machine learning tools to associate low-level features with query concepts, introducing relevance feedback (RF) into retrieval loop for continuous learning of user’s intention, generating semantic template (ST) to support high-level image retrieval, and making use of both the visual content of images and the textual information obtained from the Web for WWW (the Web) image retrieval [12].

In this paper the semantic gap and perception subjectivity problems are addressed by using fuzzy logic. Semantic clusters are defined by fuzzy c-means clustering. Fuzzy linguistic terms are defined which helps in natural language querying. A statistical texture measurement called Tamura texture of the standard Brodatz database [13] is extracted to create the feature space. Each Tamura feature space is fuzzy clustered. Multiple queries are combined using fuzzy minmax aggregation technique.

The proposed system is discussed in detail in the following section. The Tamura feature extraction and fuzzy clustering and aggregation techniques used in the system is explained in section II. Section III gives the results for some example query and the precision results for the retrieval system.

PROPOSED CBIR SYSTEM

CBIR system proposed in this paper uses Tamura texture features for retrieval and fuzzy clustering for semantically grouping the images. Brodatz digital album is used as image database. The proposed system is shown in figure 2 below.

The steps for implementation of the proposed system are

ÃÂ¯ÃâÃÂ· Feature extraction

ÃÂ¯ÃâÃÂ· Fuzzy clustering

ÃÂ¯ÃâÃÂ· Query processing

After the first two steps fuzzyfied feature space is obtained, which is stored in the system database and is used for the query process, where actual retrieval takes place.

A. Feature extraction

As mentioned earlier Tamura texture features are used as texture features. A set of six texture features taking on the basis of human psychological experiments [14]. Tamura gave a set of images from Brodatz album to a group of people and asked to describe the texture. From the result mathematical expression for six features were developed that effectively describe all input patterns and were well distributed within these patterns. The features include coarseness, directionality, contrast, line-likeness, regularity and roughness. Many CBIR have effectively used Tamura features for texture description.

Out of six three important Tamura features of all the database images are extracted. Modified versions of coarseness, contrast and directionality are used. The other three features, line-likeness, regularity and roughness are derived from the former three features, hence not used.

Coarseness describes the size of the texture particles and its measurement is given by size operator. Around each pixel, a neighbourhood of size 2k x 2k is defined (k = 0, 1 … 5). The k value that maximizes the gradient of pixel values over non-overlapping neighbourhoods in both horizontal and vertical orientation is taken as S_best . A histogram of S_best is made, which is the coarseness feature vector of the image.

Contrast modification of image is usually done by changing its gray level distribution. Tamura et al. also considered other factors like dynamic range of gray levels, polarization of the distribution of black and white on gray-level histogram. The first is measured using the standard deviation of grey levels and the second the kurtosis α₄ The

contrast measure is therefore defined as ratio of standard deviation and nth power of α₄

Directionality is measured as a global property, without considering orientation. i.e., same figures with different orientation should have same directionality. Histogram of local edges probabilities against directional angles is used which is shown to sufficiently represent global features of the input picture as long lines and simple curves.

The features are stored in feature database. Instead taking texture feature vector as element of 6-D space (corresponding to 6 Tamura features), each of the 6 feature is taken as a vector. A histogram of size measures best S is made with respect to different values of 2k for k=0, 1...5. This forms vector for coarseness. The directionality is represented by 16-D direction histogram HD. Direct histogram measure for contrast doesn’t exist. So it is represented as a vector with each component the contrast measure of one of the 128 x 128 image sub blocks. Thus a small amount of spatial information is added to the measurement.

B. Fuzzy clustering

After feature extraction fuzzy clustering of feature space takes place. The extracted features are crisp values in the range [0, 1]. Fuzzyfication of these features is done by grouping the feature space to multiple classes and assigning membership values to them. Membership value indicates the degree to which a feature belongs to the class. In fuzzy set theory each class is semantically indexed using linguistic labels. This helps to bring natural language interpretation of the features.

Fuzzyfication of texture features involves dividing the texture data to multiple classes and assigning membership values to features in each class. The algorithm for classification and membership assignment is done by modifying Lin’s algorithm [15]. Three classes are specified for each texture feature. A linguistic term is associated with a class. The linguistic term bridges the gap between the low-level feature and high-level semantics of the texture. Tamura texture features and linguistic terms associated with each feature are shown in table I. The class of the image is represented by the cluster centers and its membership value in each cluster. The cluster centres and membership value of a feature is obtained using fuzzy c-means clustering algorithm. These linguistic terms and membership values are stored for query processing.

Fuzzy c-means algorithm is the fuzzy extension of the k-means algorithm [16]. The algorithm is described as described below:

Given the data set Z , choose the number of cluster1< c< N the weighting exponent m >1, the termination tolerance ε> 0and the norm-inducing matrixA.

The result of clustering is N classes to which the data vectors are assigned. Each vector can be a part of more than one cluster, depending on the membership value. Fuzzy c-means clustering scheme is used in this thesis to identify the cluster to which the database images belong. The membership value gives the degree of inclusion of an image to a cluster thus the indecisiveness in classification is also brought to consideration.

C. Query Processing

In this step a textual query is given to the system. Predefined query terms are used. They are the fuzzy linguistic terms as given in table 5.1. Multiple queries are connected using logic operators like OR, AND and NOT. Examples for fuzzy querying are as follows.

ÃÂ¯ÃâÃÂ· Coarse and medium contrast and isotropic.

ÃÂ¯ÃâÃÂ· Fine and high contrast or directional.

The connectives AND, OR, and NOT are used in the system to aggregate multiple features. The results of these operations are also fuzzy in nature, with membership value as defined in table II given below.

RESULTS

The proposed system was applied to 111 images of Brodatz database in GIF format. The system is implemented in MATLAB. Measurements for directionality and coarseness are made without partitioning the image into sub images. Local contrast measurements are taken by dividing images to 128 x 128 pixel blocks. The queries used to verify the effectiveness of the system are given below:

1) Coarse, High contrast and Isotropic.

2) Fine, Low contrast and Directional

3) Coarse, Low contrast and Directional

The top 9 retrieval results for the above three query are shown in figure 3, figure 4 and figure 5, respectively. The retrieved image name and corresponding membership value is given along with the image. The similarity is measured basis of membership value obtained after aggregation. The images are arranged in decreasing order of the membership value. It can be seen that as membership value decreases the images is less likely to satisfy all the three requirements.

In figure 3 the first 6 retrieved images fully satisfy the query, being high contrast, coarse and isotropic. The seventh image is slightly directional last one has poor contrast. But one may decide not to choose these images as its membership to these groups is very low.

The observations made for query 1 can also be applied to query 2 and query 3. It can also be observed that images D44 and D1 are retrieved for both query 2 and query 3. But from the membership values it can be concluded that they are of fine, low contrast and directional (query 2) rather than coarse, low contrast and directional. i.e. the images are part of more than one cluster. Thus the issue of perception subjectivity is taken in to by the proposed system.

Precision measures the retrieval accuracy of a CBIR system [17]. It is the ratio of number of relevant images retrieved to total number of images retried. The relevant images are those which satisfy the query to the system.

Precision=No of relevant images retrieved/Total No of images retrieved

The precision measures for the three queries are given in table III. Precision when the total number of images retrieved is 5, 9, 15, 20 and 25 are given. The number of relevant images is given in brackets.

CONCLUSION

A CBIR system that uses a natural language query is proposed in the paper. Tamura texture measures of the image is extracted and fuzzyfied using fuzzy c-means algorithm. It is stored in database as the low level fuzzy feature vector of the images. Multiple queries are aggregated using fuzzy min-max algorithm and the resulting membership value is used as similarity measure. Greater the membership the image better satisfy the query. The effectiveness of system is verified using Brodatz texture database. The precision results show that the images with larger membership are more relevant to the query.

Tables at a glance


Table 1	Table 2	Table 3

Figures at a glance


Figure 1	Figure 2	Figure 3

Figure 4	Figure 5	Figure 6

References

R. Dutta, D. Joshi, J. Li, and J. Z. Wang, “Images Retrieval: Ideas, Influences, and Trends of New Age”, ACM Computing Surveys, Vol. 40,No. 2, 2008.

Y. Rui, T.S. Huang, M. Ortega, S. Mehrotra, Relevance feedback: a power tool for interactive content-based image retrieval, IEEETransaction on Circuits Video Technology, vol 8, no. 5, pp. 644–655, 1998.

N. S. Chang and K. S. Fu, “Query-by pictorial-example”, IEEE Transactions on Software Engineering, vol.6., 1980.

S. F. Chang, J. R. Smith, M. Beigi, and A. Benitez, “Visual information retrieval from large distributed online repositories”, CommunicationACM (Special Issue on Visual Information Retrieval), pp. 12–20, Dec. 1997.

M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele and P. Yanker,“Query By Image and Video Content: The QBIC System”, IEEE Computer, Vol. 28, No: 9, pp. 23-32, September 1995.

Pentland, R. Picard and S. Sclaroff, “Photobook: Content-based Manipulation of Image Databases”, International Journal of ComputerVision, Vol. 3, pp. 233-254, 1996.

Gupta, “Visual Information Retrieval: A Virage Perspective”, VirageInc, San Diego, Technical Report Revision 4, CA 92121, 1996.

W. Y. Ma, B. Manjunath, “Netra: a toolbox for navigating large image databases”, Proceedings of the IEEE International Conference onImage Processing, 1997, pp. 568–571.

J. R. Smith, S. F. Chang, “VisualSeek: a fully automatic content based query system”, Proceedings of the Fourth ACM InternationalConference on Multimedia, 1996, pp. 87–98.

J.Z. Wang, J. Li, G. Wiederhold, “SIMPLIcity: semantics-sensitive integrated matching for picture libraries”, IEEE Transactions PatternAnalysis and machine intelligence, vol. 23, no. 9, pp. 947–963, 2001.

F. Long, H.J. Zhang, D.D. Feng, Fundamentals of content-based image retrieval, Multimedia Information Retrieval and Management,Springer, Berlin, 2003.

Y. Liu, D. Zhang, G. Lu, Wei-Ying Ma, “Asurvey of content-based image retrieval with high-level semantics”, Pattern Recognition ,vol 40,pp. 262 – 282

P. Brodatz, Textures, A Photographic Album for Artists & Designers, Dover, New York, NY, 1966.

H. Tamura, S. Mori, and T. Yamawaki, "Texture features corresponding to visual perception," IEEE Transactions on Systems, Man, andCybernetics, vol. Smc-8, No. 6, June 1978.

H. C. Lin, C. Y. Chiu, S. N. Yang, “A Fuzzy Logic CBIR System”,Proceedings of IEEE International Confereene on Fuzzy Systems, pp.1171-1176, 2003,

J. C. Bezdek, R. Ehrlich, W. Full, “The Fuzzy c - means clustering algorithm”, Computers and Geosciences, Vol-10, pp. 191-203, 1984.

H. Muller, W. Muller, D. M. Squire, S. M. Maillet, T. Pun, “Performance Evaluation of Content Based Image Retrieval: Overview andProposals”, Pattern recognition Letters -21, pp. 593-601, 2001.