ISSN ONLINE(2320-9801) PRINT (2320-9798)

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Comparison of Text Extraction Techniques- A Review

Divya gera1, Neelu Jain2
  1. ME Scholar, Dept of E & C, PEC University of Technology, Chandigarh, India
  2. Associate Professor, Dept of E & C, PEC University of Technology, Chandigarh, India
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering

Abstract

Text in images contain important contents for information indexing and retrieval, automatic annotation and structuring of images. Hence text extraction is the crucial stage of analyzing the images. The steps involved in text extraction algorithms are detection, localization, binarization, extraction, enhancement, and recognition of text from the image. Text extraction is a very challenging task due to the variations in text size, font, style, orientation and alignment as well as complex background. Several text extraction techniques based on edge detection, connected component analysis, morphological operators, wavelet transform, texture features, neural network etc. have been developed. This paper provides a review of the various techniques suggested by researchers and their comparative analysis in terms of precision rate, recall rate, detection rate etc.

 

KEYWORDS

Discrete wavelet transform, Connected Component, Edge, Support vector machine, Discrete cosine transform.

INTRODUCTION

The image content is classified into two categories: perceptual content and semantic content [1]. Perceptual contents include colors, shapes, textures, intensities, and their temporal changes while semantic contents include objects, events, and their relations. Text content contains high level of semantic information as compared to visual information. Therefore text extraction from images is very significant in content analysis. It has many useful applications such as automatic bank check processing [2], vehicle license plate recognition [3], document analysis and page segmentation [4], signboard detection and translation [5], content based image indexing, assistance to visually impaired persons, text translation system for foreigners etc.
Text appearing in images is classified into three categories: document text, caption text, and scene text [6]. In contrast to caption text, scene text can have any orientation and may be distorted by the perspective projection therefore it is more difficult to detect scene text.
? Document text: A document image (Fig. 1) usually contains text and few graphic components. It is acquired by scanning journal, printed document, handwritten historical document, and book cover etc
? Caption text: It is also known as overlay text or artificial text (Fig. 2). It is artificially superimposed on the image at the time of editing, like subtitles and it usually describes the subject of the image content.
? Scene text: It occurs naturally as a part of the scene image and contain important semantic information such as advertisements, names of streets, institutes, shops, road signs, traffic information, board signs, nameplates, food containers, street signs, bill boards, banners, and text on vehicle etc (Fig. 3).
A. Properties of Text in Images:
Texts usually have different appearance due to changes in font, size, style, orientation, alignment, texture, color, contrast, and background. These changes will make the problem of automatic text extraction complicated and difficult. Text in images exhibit variations due to the difference in the following properties [7]:
? Size: The size of text may vary a lot.
? Alignment: Scene text may be aligned in any direction and have geometric distortions while caption text usually aligned horizontally and sometimes may appear as non-planar text.
? Color: The characters tend to have same or similar color but low contrast between text and background makes text extraction difficult.
? Edge: Most caption and scene texts are designed to be easily read, hence resulting in strong edges at the boundaries of text and background.
? Compression: Many images are recorded, transferred, and processed in compressed format. Thus, a faster text extraction system can be achieved if one can extract text without decompression.
? Distortion: Due to changes in camera angles, some text may carry perspective distortions that affect extraction performance.
B. Process of Text Extraction:
The input image may be gray scale or color, compressed on uncompressed format. Text detection refers to the determination of the presence of text in the image while text localization is the process of determining the location of text and generating bounding boxes around it. After that, text is extracted i.e. segmented from the background. Enhancement of the extracted text is required as the text region usually has low-resolution and is prone to noise. Thereafter, the extracted text can be recognized using OCR. The block diagram of text extraction is shown in Fig. 4.

TEXT EXTRACTION TECHNIQUES

The various techniques of text extraction are as follow:
A. Region based Method:
Region-based method uses the properties of the color or gray scale in the text region or their differences to the corresponding properties of the background. They are based on the fact that there is very little variation of color within text and this color is sufficiently distinct from text's immediate background [20]. Text can be obtained by thresholding the image at intensity level in between the text color and that of its immediate background.
This method is not robust to complex background. This method is further divided into two sub-approaches: connected component (CC) and edge based.
i.) CC based Method:
CC-based methods use a bottom-up approach by grouping small components into successively larger components until all regions are identified in the image [9-12]. A geometrical analysis is required to merge the text components using the spatial arrangement of those components so as to filter out non-text components and the boundaries of the text regions are marked. This method locate locates text quickly but fails for complex background.
ii.) Edge based Method:
Edges are a reliable feature of text regardless of color/intensity, layout, orientations, etc. Edge based method is focused on high contrast between the text and the background [5,13-15]. The three distinguishing characteristics of text embedded in images that can be used for detecting text are edge strength, density and the orientation variance. Edgebased text extraction algorithm is a general-purpose method, which can quickly and effectively localize and extract the text from both document and indoor/ outdoor images. This method is not robust for handling large size text.
B. Texture based Method:
This method uses the fact that text in images have discrete textural properties that distinguish them from the background. The techniques based on Gabor filters, Wavelet, Fast fourier transform (FFT), spatial variance, etc are used to detect the textual properties of the text region in the image [16-19]. This method is able to detect the text in the complex background. The only drawback of this method is large computational complexity in texture classification stage.
C. Morphological based Method:
Mathematical morphology is a topological and geometrical based method for image analysis [16,17,20]. Morphological feature extraction techniques have been efficiently applied to character recognition and document analysis. It is used to extract important text contrast features from the processed images. These features are invariant against various geometrical image changes like translation, rotation, and scaling. Even after the lightning condition or text color is changed, the feature still can be maintained. This method works robustly under different image alterations.

PERFORMANCE ANALYSIS

A. Various parameters are used to analyze the performance of text extraction techniques and given as follow:
image
image
image
image
B. Comparative Analysis of Related Work:
Many researches have been done on various text extraction techniques such as region based (CC based and edge based), texture based, morphological based or combination of these technique (i.e. hybrid approach). Researchers have used different type of images for their experimentation. The detailed analysis of text extraction techniques is shown in Table 1.

CONCLUSION

In this paper, various techniques such as region based, edge based, connected component (CC) based, texture based, morphological based etc. have been discussed and a detailed comparison of these techniques on the basis of various parameters such as precision rate, recall rate, accuracy etc. has been done. Every approach has its own benefits and restrictions. Even though there are many numbers of algorithms, there is no single unified approach that fits for all the applications due to variation in font, size, alignment, complex background of text etc. It is concluded that texture based method can detect and localize text accurately even when images are noisy, complex background and low resolution.

Tables at a glance

Table icon
Table 1
 

Figures at a glance

Figure Figure Figure Figure
Figure 1 Figure 2 Figure 3 Figure 4

References