Ontology Based Question Answering System

Ankita Singh; Nidhi Tyagi

Ontology Based Question Answering System

Ankita Singh¹ and Nidhi Tyagi²

Assistant Professor, Dept. Of CSE, BIT, Meerut, India
Associate Professor, Dept. Of CSE, Shobhit University, Meerut, India

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering

Abstract

Data and Information requirement is increasing with the Increase in the volumes of data in the repositories such as www etc, now question arises that out of this enormous data how to find the information which is required by the user and should be specific in nature. Information retrieval techniques solves the problem to an extent but they cannot help in a situation where only specific information pertaining to a question is required. Information retrieval engines will retrieve documents containing phrases and paragraphs which may have an answer to user query. This problem is addressed in this research paper which proposes a question answering system to satisfy users specific information need.

Keywords

QAS, PQC, Ontology, NE.

INTRODUCTION

Question Answering systems are designed to satisfy the users specific information need. In these systems questions are asked in natural language which is then used to identify keywords, named identities, question type which are then used to formulate the database query. Ontology is the conceptualization of knowledge [1], Ontologies are written in OWL and they exhibit the hierarchical structure, this paper presents a way to store the ontologies in the database in such a way that they can be queried irrespective of their hierarchical nature, and also proposes an architecture of question answering system which is used to process the natural language question and retrieve the answer from the knowledge base.

RELATED WORK

The question answering system proposed so far are completely based on document analysis, for example In [6] author has proposed a system called BASEBALL which is one of the earliest question answering system, it is a program for answering questions about baseball games played in the American league over one season. The system was able to answer narrow-domain questions about statistics compiled over a season of American League play by using shallow parsing techniques on the natural language query to identify the teams and statistics in question. In [7] author proposed a question answering system named LUNAR was also based on narrow domain question answering, In [8] during 8th TREC conference author first proposed the QA track, which required answering factoid questions by returning a text snippet which contained an answer to the asked question. In [9], author proposed the first web based question answering system which was different from earlier question answering system because this system was using web as its corpus for extracting answers to the question unlike to earlier system which were based on fixed size corpus. Some of the web based systems are START, Answer Bus, AskMAR etc. In [10] authors has proposed question answering systems named Chinese QAS, which is language specific, morphological analysis and parsing is more difficult in this systems because of its language specific annotations and symbols.

PROPOSED ARCHITECTURE

The proposed architecture of OBQAS comprises of three functional components. They are Question Processing module, query formulation module, Answer selection module.

A. Question Processing Module

Question Processing Module consists of two components PQC and Lexical analyser. Lexical analyser is further consisting of two components Question type Identifier and keywords Identifier. A natural language question (NLQ) is presented to this module which is directly fed to PQC.

PQC is Previous Question Cache whenever a natural language question is presented to the system it goes to this cache which records the previously asked questions if the just arrived question matches the previously asked question then answer to that question is retrieved directly from that cache otherwise it gets stored at the top of the list. Previous question cache is maintained in linear list stack where stack pointer points to latest question asked.

The question is then presented to lexical analyser which parses it to identify the keywords and named entity in the question. These identified keywords are then presented to query formulator.

B. Query Formulator

The query formulator consist of query formulation engine and query cluster the named entities identified, question type and keywords extracted from NLQ are passed to query formulation engine, query cluster consist of query syntax depending upon the question type query is formulated using the syntax. Table 2 [2] consist of the various question types for which the query syntax is present in the query cluster.

C. ANSWER SELECTOR

The query formulated in the query formation module is presented to the relational database which returns the answer, the answer to the question is returned to the user and also stored in the PQC along with the question on the top of the list.

D. Database Creation

The system is ontology based and ontologies are written in OWL the documents pertaining to a specific domain are in xml these documents have a hierarchical structure between the objects present in the document the xml document are stored in the relational table in the form of xml schema.

SIMULATION

Step 1: First of all the system is presented an XML document [4] from which an ontology is derived,

</employee>

<name>Peter Pan</name>

</employee>

</dept>

Step 2: Hierarchical representation of above Ontology.

Step 3: Database Creation for Department Ontology

• create table dept (deptID char(8), deptdoc xml);

Through this command a table will be created which is a relational containing 2 columns

They are dept id and deptdoc which is stored in its hierarchical structure as shown:

Step 4: User query in natural language:

NLQ: What is the name of employee with phone number 408-222

This natural language query is parsed to identify:

i) Question Type: what

ii) Keyword: phone number 408-222

iii) Named entity: name, employee

Query formulation is an internal process, the query formed by the formulator engine is

Select employee name from Dept

Where

Xmlexist (‘$ DEPTDOC/ department/ employee [phone number = ‘408-222’]’)

The retrieved answer is Peter Pan.

CONCLUSION

The QAS proposed here not only serves the purpose of question answering but its architectures simplicity makes it efficient in terms of answer retrieval. The system can be improved if the ontology can be updated automatically just as web repositories are updated through page refreshing [5] techniques.

Tables at a glance


Table 1	Table 2	Table 3	Table 4

Figures at a glance


Figure 1	Figure 2	Figure 3	Figure 4

References

B. Chandrasekaran and John R. Josephson, “What Are Ontologies, and Why Do We Need Them?” IEEE Intelligent System Jan/Feb issue pp. 20-26,1999.

A New Model for Question Answering Systems, Mohammad Reza Kangavari, Samira Ghandchi, Manak Golpour , World Academy of Science, Engineering and Technology 18 2008.

IBM db2 9.7 pure xml information management cloud computing center of competence, ibm canada lab.

http://www.ibm.com/developerworks/data/library/techarticle/dm-1006queriespurexml/dm-1006queriespurexml-pdf.pdf

Rosy Madaan et. al. / (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 03, pp. 753-758, 2010.

Green, B., Wolf, A., Chomsky, C., and Laughery, K. "BASEBALL: an automatic question answerer," in: Readings in natural language processing , Morgan Kaufmann Publishers Inc., pp. 545-549, 1986.

Woods, W.A. "Progress in Natural Language Understanding - an application to lunar geology," American Federation of Information Processing Societies, pp. 441-450, 1973.

E. Voorhees, “The TREC-8 Question Answering Track Report “, in NIST Special Publication 500-246: The Eighth Text Retrieval Conference (TREC-8), pp. 77-82,1993.

J. Lin and B. Katz. “Question answering from the web using knowledge annotation and knowledge mining techniques”, in CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management, New York, NY, USA,pp.116-123,2003.

Gai-Tai Huang, Hsiu-Hsen Yao, “ Chinese question answering system”, Journal of computer science and technology Vol. 19, No. 4, pp 479-488, 2004.