Automatic Language Identification from Written Texts – An Overview | Abstract

ISSN ONLINE(2320-9801) PRINT (2320-9798)

Special Issue Article Open Access

Automatic Language Identification from Written Texts An Overview

Abstract

Language Identification is the task of automatically identifying the language(s) in which the content is written in a document (web page, text document). Due to the widespread use of internet, identification of languages has become an important preprocessing step for a number of applications such as machine translation, Part-of-Speech tagging, linguistic corpus creation, supporting low-density languages, accessibility of social media or user-generated content, search engines and information extraction in addition to processing multilingual documents. In a multilingual country like India, Language Identification has wider scope to bridge the digital divide between different language users. This paper presents a brief overview of the challenges involved in automatic language identification, existing methodologies and some of the tools available for language identification.

H L Shashirekha

To read the full article Download Full Article