HINDI LANGUAGE INTERFACE TO DATABASES

Himani Jain; Parteek Bhatia

HINDI LANGUAGE INTERFACE TO DATABASES

Himani Jain^*1, Parteek Bhatia²

Department of Computer Science and Engineering, Thapar University, Patiala, INDIA
Department of Computer Science and Engineering, Thapar University, Patiala, INDIA

Corresponding Author: Himani Jain, E-mail: himani88jain@gmail.com

Related article at Pubmed, Scholar Google

Visit for more related articles at Journal of Global Research in Computer Sciences

Abstract

The need for Hindi Language interface has become increasingly accurate as native people are using databases for storing the data. Large number of e-governance applications like agriculture, weather forecasting, railways, legacy matters etc use databases. So, to use such database applications with ease, people who are more comfortable with Hindi language, require these applications to accept a simple sentence in Hindi, and process it to generate a SQL query, which is further executed on the database to produce the results. Therefore, any interface in Hindi language will be an asset to these people. This paper discusses the architecture of mapping the Hindi language query entered by the user into SQL query.

Keywords

Hindi language interface to databases, Natural language interface to databases, Tokenizer, Parser, Semantically tractable

INTRODUCTION

• Each relation token corresponds to either an attribute token or a value token.

This means that (a) the database relation matching the relation token and the database element matching the attribute or value token are compatible and (b) the relation token is attached to the corresponding attribute or value token.

Next section discusses the literature survey of some already existing systems. Then it deals with system details and the attribute/value graph. Lastly, it discusses the architecture for the system.

LITERATURE SURVEY

There are many already systems that were the beginning of the era for NLIDB. The best known NLIDB of sixties and early seventies was LUNAR [4] , a natural language interface to a database containing chemical analyses of moon rocks. RENDEZVOUS engaged the user in dialogues to help him/her formulate his/her queries. LADDER could be used with large databases and it could be configured to interface to different underlying database management systems (DBMS). Chat- 80[5] is one of the best-known NLIDBs of the early eighties. It was implemented completely in Prolog. It transformed English questions into Prolog expressions, which were evaluated against the Prolog database.

ASK [6] [7] developed in 1983, allowed end-users to teach the system new words and concepts at any point during the interaction. It was actually a complete information management system, providing its own built-in database, and the ability to interact with multiple external databases, electronic mail programs, and other computer applications. All the applications connected to ASK were accessible to the enduser through natural language requests. The users stated his / her requests in English, and ASK transparently generated suitable requests to the appropriate underlying systems.

SYSTEM DETAILS

SQL Query

SELECT DISTINCT Description FROM JOB WHERE Company=’HP’ AND Platform=’UNIX’;

A mapping from a complete sentence tokenization to a set of database elements such that conditions 1 through 3 are satisfied is a valid mapping. If the sentence tokenization contains only distinct tokens and at least one of its value tokens matches a wh-value, we refer to the corresponding sentence as semantically tractable.

“Fig. 1” shows the tokenization with attributes of relation. The problem of finding a mapping from a complete tokenization of question to a set of database elements such that the semantic constraints are satisfied is reduced to a graph-matching problem. We use the max-flow algorithm to eciently solve imager

attributes. Finally, both E and I link to the sink node T. The two instances of the column containing DB attribute nodes. The unit edge from each DB attribute node to itself ensure that only one unit of flow in fact traverses each such node. These edges are needed because more than one DB value is compatible with a given DB attribute and a DB attribute may match more than one attribute token. However, the definition of a valid mapping requires each DB attribute be used only once. The graph is interpreted as a flow network where the capacity on each edge is 1, unless otherwise indicated. The capacity on the edge from E to T is the number of attribute tokens. The capacity on the edge from I to T is the number of Value Tokens minus the number of attribute tokens. That dierence is 2 in our example. The maximum flow through the network in this example is 3. In fact, the maximum flow in any graph constructed by the system matcher is equal to the number of value tokens because each value token has to participate in the match produced by the algorithm.

RESULTS

If the sentence tokenization contains only distinct tokens and at least one of its value tokens matches a wh-value, we refer to the corresponding sentence as semantically tractable. In this example, results show that matches with , so there is one-to-one match between the sentence tokens and the database elements that satisfies the semantic constraints in the set of conditions for semantically tractable sentences. So applying results we can say that, a question q is said to be semantically tractable relative to a given lexicon L, and an attachment function AF if and only if q has at least one complete tokenization T such that:

1) All tokens in T are distinct.

2) T contains at least one wh-token.

3) There exists a valid mapping (respecting AF and L) from T to some set of database elements E.

The parsing of Hindi sentence makes it to understand the sentence completely which helps in generation of final query.

CONCLUSION

This system accepts query in Hindi language that is translated into SQL query, by mapping the Hindi language words, with their corresponding Hindi words with the help of database maintained. Then this SQL query is executed on database to provide output to the user.

References

Akshar Bharati, Rajeev Sangal, Dipti Misra Sangal, “Shakti Standard Format Guide”, Centre for Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India.
ILTM Consortium, “ILMT System”, IIT Hyderabad, Gachibowli, Hyderabad, Feb 2007.
Ana-Maria Popescu, Oren Etzioni, Henry Kautz, “Towards a Theory of Natural Language Interfaces to Database”, University of Washington, Computer Science Seattle, WA 98195, USA.
W.A. Woods, R.M. Kaplan, and B.N. Webber, “The Lunar Sciences Natural Language Information System: Final Report”, BBN Report 2378,Bolt Beranek and Newman Inc., Cambridge, Massachusetts, 1972.
D.Warren and F. Pereira, “An Efficient Easily Adaptable System for Interpreting Natural Language Queries”, Computational Linguistics, July-December 1982, pp. 3-4, 110-122.
B.H. Thompson and F.B. Thompson, “Introducing ASK, A Simple Knowledgeable System”, In Proceedings of the 1st Conference on Applied Natural Language Processing, Santa Monica, California, 1983, pp. 17-24.
B.H. Thompson and F.B. Thompson, “ASK is Transportable in Half a Doze Ways”, ACM Transactions on Once Information Systems, April 1985, pp 185-203.