An Advanced Approach for Text Query Searching and Word Spotting In Word Images
Word-spotting refers to the problem of detecting specific keywords in document images. Here we focus on handwritten word images. Keyword spotting in handwritten image document in the existing work is based upon BLSTM Neural Networks which consist of two parts. First part is preprocessing phase, performed by the neural network. It maps each position of an input sequence to a vector, indicating the probability of each character possibly being written at that position. The second part, called the CTC Token Passing algorithm, takes this sequence of letter probabilities, as well as a dictionary and a language model, as its input and computes a likely sequence of words. By extending this work, the present work proposes Information retrieval and information (text) extraction methods from all handwritten documents of images. In Information retrieval approach the input query is text format .The text is matched with template character then the query image is created from template characters. This proposed approach provides an efficient way of searching text like queries in document images. The text extraction from the images includes thresholding, segmentation, edge detection and text extraction algorithm. The experimental results show the performance of the proposed algorithms achieves higher accuracy rates than existing approaches.
Haritha V R, Sreeram S
To read the full article Download Full Article