Ontology Based Mining Techniques for
Systematic Allocation of Project Proposals to
External Reviewers

T.SahayaArthi Jeno; Mr.B.Lakshmipathi

Ontology Based Mining Techniques for Systematic Allocation of Project Proposals to External Reviewers

T.SahayaArthi Jeno¹, Mr.B.Lakshmipathi²

PG Scholar, Dept. of Computer Science and Engineering, Anna University, Regional Centre, Coimbatore, TamilNadu, India.
Assistant Professor, Dept. of Computer Science and Engineering, Anna University, Regional Centre, Coimbatore, TamilNadu, India.

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Science, Engineering and Technology

Abstract

Research project proposals selection is an important and challenging task in organization, when large numbers of research proposals are collected. The common task is to group research proposal based on their similarity in disciplined area and research area. Current methods for grouping proposals are based on manual matching of similar research discipline areas. This paper, Ontology-based Classification and Clustering is presented to classify Research Project Proposals as well as External Research Reviewers and then group them based on their research discipline areas. Grouped proposals are assigned to experts group for peer review in a systematic way. This approach is used to improve the efficiency and effectiveness of research project selection processes in government and private research funding agencies.

Keywords

Classification, Clustering, Concept-based analysis, Ontology-based Text Mining, Peer view, Research Project Selection

INTRODUCTION

The Research project proposals selection is an important and challenging task by the government and private funding agencies, when large numbers of research proposals are collected. Optimal allocation of research project proposals is a challenging multi - process starts with invoking research proposals by a funding agency. The proposal invoking is distributed to relevant communities such as universities or research

institutions. Submission of the research proposals by many institutes and organizations are assigned to experts for peer view based on their similarity. The review results are examined, and the proposals are then ranked based on the aggregation of the experts’ review results.

Fig.1 shows the processes of research project selection. Invoking proposals, Proposals submission, grouping of proposals, Assigned to experts, Peer Review, Aggregation of review, Evaluation and Ranking of proposals, Funding Decision are very similar activities involved in all the funding agencies [1]. In Research Funding Agencies, the number of research proposals received has more than doubled in the past few years. To assure accurate and reliable opinions on proposals four to five reviewers are assigned to review each proposal. For very large number of proposals received by the agencies need to group the proposal for peer review [2].

Governments as well as private research funding agencies are made up of several scientific departments, bureaus, general office, and associated units.

Decision-making units like scientific departments are responsible for funding recommendations and management of funded projects. Scientific departments are classified according to scientific research areas, like mathematical and physical sciences, material sciences, engineering and earth sciences, information sciences, chemical sciences, life sciences, and management sciences. Departments are further divided into several divisions with a focus on more specific research areas. Example, the Department of Management Science is further divided into the following divisions: Management Science and Engineering, Policy and Macro Management, and Business Administration. Furthermore, discipline areas of divisions are called as programs.

The department for selection process (i.e.) Division managers or program directors can assign the grouped proposals to the external reviewers for evaluation and rank them based on their aggregation. In manual based grouping, the department of selection process is responsible for grouping and they may not have adequate knowledge regarding all the issues and areas of the research proposals and the contents of many proposals were not fully understood. There are several text-miningmethods used to classify and cluster the documents. These approaches are developed only with a focus on English text. Text-mining methods which deals with English text not effective in processing Chinese text. Chinese text consists of string of Chinese characters, whereas English text uses words. And also Chinese text has no delimiters to mark word boundaries but English text uses a space as a word delimiter. There are several methods were proposed to deal with Chinese text but they are not sufficiently robust to process research proposals. Therefore, there was an urgent need for an effective and feasible approach to group the submitted research proposals efficiently based on their disciplined areas by analyzing full text information of the proposals with computer supports. An ontology-based text-mining technique like classification and clustering is proposed to solve the problem. In particular, it possesses the following advantages, The proposed strategy is fundamentally different from the existing Text-mining methods, and it outperforms the available Text-mining methods (TMMs). Because of this essential change, the proposed strategy overcomes the drawbacks of Textmining methods, such as manual based research proposals grouping and assignment. This proposed approach can provide us a way to easily classify and group the research proposals and reviewers and also used in funding agencies that face information overload problems. The proposed work has presented an framework on ontology based classification and clustering for grouping research proposals and assigning the grouped proposal to reviewers systematically. The Ontology based classification and clustering is very user friendly and time consuming.

The residue of this paper is organized as follows. Section II reviews the research background and objective. The existing method is described in Section III and the proposed method is described in Section IV. At last, Section V provides the conclusion and its future work.

RESEARCH BACKGROUND AND OBJECTIVES

Research and Development (R&D) project selection is complicated and knowledge intensive decision-making process where decision models and knowledge rules play an important role. The following are some research work that leads for proposed technique: A R&D Project Selection Approaches Q.Tian et al. [3] proposed Hybrid knowledge and model system for R&D project selection, which integrates mathematical models with knowledge rules. This system is designed to support the whole decision process of R&D project selection and has been used in the selection of R&D projects in the NSFC. K.Chen et al. [4] proposed Fuzzy-logic-based model as a decision tool for project selection, which smoothly aids decision makers dealing with uncertain or incomplete information without losing existing quantitative information. A.D.Henriksen et al. [5] presented An improved scrolling tool for R&D project evaluation and selection is presented that ranks project alternatives based on the criteria of relevance, reasonableness, risk and return. Algorithm scoring explicitly incorporates tradeoffs among the evaluation criteria and calculates a relative measure of project value by taking into account the fact that the value is the function of both merit and cost. W.D.Cook et al. [6] presented Peer review of research proposals and articles is an essential element in R&D processes world-wide. An integer-programming set-covering model and a heuristic procedure solves the assignment problem and maximizes the number of proposal-pairs that will be evaluated by one or more reviewers and also this approach should facilitate meaningful aggregation of partial rankings of subsets of proposals by multiple reviewers into a consensus ranking. S.Hettich et al. [7] established Prototype application deployed at the U.S.National Science Foundation for ancillary program directors in identifying reviewers for proposals. Prototype application helps program directors sort proposals into panels and find reviewers for proposals. It extracts information from the full text of proposals both to learn about the topic of proposals and the expertise of reviewers. The solution that was implemented and experience in using the solution within the workflow of NSFC. Y.H.Sun et al. [10] developed a decision support system to evaluate reviewers for research project selection.Girotra et al. [11] offered an empirical study to value projects in a portfolio.

B Ontology-based Framework

Jian Ma et al. [1] proposed an OTMM for grouping of research proposals. Research proposals ontology is constructed to categorize the concept terms in different discipline areas and to form relationships among them. It facilitates techniques like text-mining and optimization is used to cluster research proposals based on their similarities and then to balance them according to the applicants’ characteristics. The OTMM improved the similarity in proposal groups as well as took into consideration the applicants’ characteristics (e.g., distributing proposals equally according to the applicants’ affiliations) and also, promotes the efficiency in the proposal grouping process. PreetKaur et al. [2] developed Ontology based classification and clustering approach is used for grouping the Research Proposals and the research Reviewers. Combination of Data Mining techniques is used with the help of Ontology. It can provide a way to easily classify and group the research proposals and the reviewers. The proposed work efficiently classifies the research areas. S.Bechhofer et al [13] developed an OWL Web Ontology Language for storing the keywords.Yildiz et al [14] designed an ontoX—A method for ontologydriven information extraction. V.M.Navaneethakumar, Dr.C.Chandrasekar [2012] ÃÂ¢Ãâ¬Ãâ¢A Consistent Web Documents Based Text Clustering Using Concept Based Mining ModelÃÂ¢Ãâ¬Ãâ [15] referred this paper for clustering the proposals based on concept.N.Arunachalam et al[16] developed A knowledge based agent is appended to the proposed system for a retrieval of data from the system in an efficient way.

C Text Mining Approaches

Huan-Chao Keh et al. [12] developed Filtering measure is used for feature selection in Chinese text categorization system. Term Frequency-Inverse Document Frequency (TF-IDF) to strengthen importantkeywords’ weights and weaken unimportant keywords’ weights. We use category priority to represent the knowledge of the categories (i.e.) relationship between two different categories. The objective of this work is Systematic assignment of research project proposal group to reviewers group to reduce the time consuming problem by analyzing the full text information of the proposal. To face information overload problems in government as well as private funding agencies and also Efficient and effective classification of research project proposals as well as reviewers.

EXISTING SYSTEM

The existing system is an Ontology-Based Text- Mining Method (OTMMs) to cluster research proposals based on their similarities in research areas. Ontology is a knowledge repository in which concepts and terms are defined as well as relationships between these concepts .It consists of a axioms, relationships and set of concepts that describe a domain of interests and represents an agreed-upon hypothesis of the domains of real-world setting. Using Ontology Implicit knowledge for humans is made explicit for computers . Thus, ontology can automate information processing and can facilitate text mining in a specific domain (such as research project selection). An ontology based text mining skeleton has been built for clustering the research proposals according to their discipline areas. Text mining notice generally to the process of extracting interesting information and knowledge from unstructured text. Inequality between regular data mining and text mining is that text mining patterns are extracted from natural language text rather than from structured databases of facts.

D Constructing a Research Ontology,a research project proposal ontology containing the projects funded in latest five years is assembled according to keywords, and it is updated annually. While considering domain ontology research ontology is a public concept set of the research project management domain. Research ontology is used to express the research topics of different disciplines.

E Classifying New Research Proposals, new research proposals are classified into number of classes according to the keyword stored in ontology.

F Clustering:Research Proposals Based on Similarities Using Text Mining, proposals in each discipline are clustered using the text- mining technique. The clustering process consists of five steps, text document collection, preprocessing, encoding, vector dimension reduction, and text document vector clustering. The newly submitted proposals in each discipline are clustered using a self-organized mapping algorithm (SOM).

G Balancing Research Proposals and Regrouping Them by Considering Applicants’ if the number of proposals in each cluster is still very large, they will be further break up into subgroups where the applicants’ characteristics like affiliated universities are taken into consideration. Reviewers may feel confused and uncomfortable when evaluating proposal that may have poor decomposition so it is advisable that the applicants’ characteristics in each proposal group should be as diverse as much as possible.

ONTOLOGY BASED CLASSIFICATIONAND CLUSTERING FRAMEWORK

Proposed system presents a framework on ontology based classification and clustering approach to cluster research project proposals, and external reviewers based on their research area and to assign concerned research proposals to reviewers systematically. In the R&D, after proposals are submitted, the next challenging task is to group proposals and assign them to reviewers. The research proposals in each group should have similar trait. In case, if the proposals in a group fall into the same primary research discipline (e.g., data mining) and the number of proposals is small, the manual grouping based on keywords listed in proposals can be used and assign them to reviewer manually. However, if the number of proposals is large, that is very difficult to group proposals and assign them to reviewer manually. So the proposals and external reviewers are classified using ontology and clustered using text mining techniques and last it is submitted to reviewer systematically. Fig 3 shows the proposed architecture of entire system.

A Research Ontology as well as Reviewers Ontology construction

i) Creating Research Topics:

The keywords of the supported research projects each year are collected, their frequencies are counted. And the keyword frequency is the sum of the same keywords that appeared in the discipline during the most recent five years.

ii) Constructing Research Ontology:

Research ontology is constructed according to scientific research areas and department of data selection process. It is then developed on the basis of several specific research areas. Next, it is further divided into some narrower discipline areas. Lastly, it leads to research topics in terms of the feature set of disciplines. In first part, there are some crossdiscipline research areas (eg.ÃÂ¢Ãâ¬Ãâdata miningÃÂ¢Ãâ¬Ãâ can be placed under ÃÂ¢Ãâ¬Ãâ¢Information ManagementÃÂ¢Ãâ¬Ãâ in ÃÂ¢Ãâ¬Ãâ¢Management SciencesÃÂ¢Ãâ¬Ãâ or under ÃÂ¢Ãâ¬ÃâArtificial IntelligenceÃÂ¢Ãâ¬Ãâ in ÃÂ¢Ãâ¬Ãâ¢Information SciencesÃÂ¢Ãâ¬Ãâ).Second part there are some synonyms used by different projects applicants, they have different names in different proposals but represent the same concepts.

iii) Constructing Reviewers Ontology:

Reviewers ontology is designed on the basis of all the domain areas of the reviewers.

iv) Update Ontologies:

Once the project funding is completed each year, the ontologies are updated according to agency’s policy. Reviewers ontology isdesigned on the basis of all the domain areas of the reviewers.

b. Text Document Preprocessing

The contents of proposals are usually unstructured. Research project Proposals comprise of Chinese characters which are problematic to segment. Research ontology is used to analyze and identify the keywords in the full text of the proposals. Finally, additional reduction in the vocabulary size can be achieved throughthe removal of all words that appeared only a few times (say less than five times) in all proposals.

c. Concept based Analysis

(1) Sentence-Based Concept Analysis

To survey every concept at the sentence level, the concept-based frequency assess, called the conceptual term frequency (ctf) is presented.

(2) Document-Based Concept Analysis

The term frequency is a local measure on the document level. To Analyze every concept at the document level, the concept based term frequency (tf) , the number of Occurrences of a concept (word or phrase) c in the document, is calculated.

(3)Based Concept Analysis

The df is a global measure on the corpus level. To survey concepts that can distinguish between documents, the concept-based document frequency (df) , the number of documents containing concept c, is calculated.

d. Concept Based Similarity

A concept-based similarity measure hang on on matching concept at sentence, document, and corpus instead of individual terms. First is to capture semantic structure of each sentence. Second is concept frequency that is used to measure contribution of concept in sentence as well as document level. Finally, the concepts measured from number of documents.

e. Clustering Techniques

With the help of existing text clustering techniques we can get that which cluster is having highest priority.

f. Output Cluster

After applying the clustering techniques we can get the clustered document. That will help to find out main concepts from the text document

ii) Reviewers Clustering

With the help of reviewers ontology research reviewers are clustered based on their similarities in each discipline area or domain. A simple K-Means text mining clustering algorithm is used for this purpose.

K-means Algorithm

K-means is a best method to quickly sort the data into clusters, solitary the need is to define the number of clustersrequired. K denotes the number of clusters in which the data is divided. The algorithm works as:

1. Randomly select K-points as the initial cluster centroids.

2. Assign each object in the dataset to the closest cluster by compute their Euclidean distance of the object from the center.

3. When all objects have been assigned recalculate the position of the K centroids.

4. Repeat step 2 & 3 until the centroid no longer move. At this point clusters are separated into groups successfully.

D Information Retrieval

For information retrieval knowledge based agent is used,which comprises a knowledge base and inference system. Systematic allocation of retrieved grouped proposals to grouped external reviewers is done by knowledge base agent

E Proposals Assign to Reviewers

The Final step of this approach is to assign the Research Proposals group to the External Research Reviewers group systematically. The Proposals of the particular Discipline area is assign to the Reviewers having the same research area or domain. So, they can examine the proposals efficiently for the peer-review.

CONCLUSION AND FUTURE WORK

This paper has presented a framework on ontology based classification and clustering for grouping research proposals as well as reviewers which will be used by research funding Agencies for grouping and assigning the grouped proposal to reviewers group systematically. Research ontology as well as reviewers ontology is constructed to categorize the concept terms in different discipline areas and to form relationships among them. This approach is user friendly and time Consuming. In this proposed approach, the combination of Data Mining techniques is used with the help of Ontology. This Proposed framework can provide us a way to easily classify and group the research proposals and the reviewers. The proposed framework efficiently classifies the research areas. In future work, a focus is required some more work done in this assignment of the proposals such as the proposals are assigned on the basis of their experience. Also, there is a need to empirically compare the results of manual classification to text-mining classification.

References

Jian Ma. Wet Xu, Hong Sun, Efraim Turban, ShouyangWang,and Ou Liu, ÃÂ¢Ãâ¬Ãâ¢An Ontology-Based Text Mining Methods to ClusterProposals for Research Project SelectionÃÂ¢Ãâ¬Ãâ, IEEE Transactions onSystems, Man, and cybernetics-Part A:System And Humans, Vol.42,No.3, May 2012.
Preetkaur and Richasapra, ÃÂ¢Ãâ¬Ãâ¢Ontology based classification andclustering of research proposals and external research reviewers,ÃÂ¢Ãâ¬ÃâJ.Inf. Sci., vol. 5, no. 1, May-June, 2013.
Q. Tian, J. Ma, and O. Liu, ÃÂ¢Ãâ¬Ãâ¢A hybrid knowledge and modelsystem for R&D project selection,ÃÂ¢Ãâ¬Ãâ Expert Syst. Appl., vol. 23, no.3, pp. 265–271, Oct. 2002.
K. Chen and N. Gorla, ÃÂ¢Ãâ¬Ãâ¢Information system project selectionusing fuzzy logic,ÃÂ¢Ãâ¬Ãâ IEEE Trans. Syst., Man, Cybern. A, Syst.,Humans, vol. 28, no. 6,pp. 849–855, Nov. 1998.
A. D. Henriksen and A. J. Traynor, ÃÂ¢Ãâ¬Ãâ¢A practical R&D projectselectionscoring tool,ÃÂ¢Ãâ¬Ãâ IEEE Trans. Eng. Manag., vol. 46, no. 2, pp.158–170,May 1999.
W. D. Cook, B. Golany, M. Kress, M. Penn, and T. Raviv,ÃÂ¢Ãâ¬Ãâ¢Optimal allocation of proposals to reviewers to facilitate effectiveranking,ÃÂ¢Ãâ¬Ãâ Manage.Sci., vol. 51, no. 4, pp. 655–661, Apr. 2005.
S. Hettich and M. Pazzani, ÃÂ¢Ãâ¬Ãâ¢Mining for proposal reviewers:Lessons learned at the National Science Foundation,ÃÂ¢Ãâ¬Ãâ in Proc. 12thInt. Conf.Knowl. Discov. Data Mining, 2006, pp. 862–871.
C. Choi and Y. Park, ÃÂ¢Ãâ¬Ãâ¢R&D proposal screening system based ontext mining approach,ÃÂ¢Ãâ¬Ãâ Int. J. Technol. Intell.Plan., vol. 2, no. 1, pp.61–72,2006.
R. Feldman and J. Sanger, The Text Mining Handbook:Advanced Approaches in Analyzing Unstructured Data. New York:Cambridge Univ. Press, 2007.
Y. H. Sun, J. Ma, Z. P. Fan, and J. Wang, ÃÂ¢Ãâ¬Ãâ¢A group decisionsupport approach to evaluate experts for R&D project selection,ÃÂ¢Ãâ¬ÃâIEEE Trans.Eng. Manag., vol. 55, no. 1, pp. 158–170, Feb. 2008.
K. Girotra, C. Terwiesch, and K. T. Ulrich, ÃÂ¢Ãâ¬Ãâ¢Valuing R&Dprojects in a portfolio: Evidence from the pharmaceutical industry,ÃÂ¢Ãâ¬ÃâManage. Sci.,vol. 53, no. 9, pp. 1452–1466, Sep. 2007.
D. A. Chiang, H. C. Keh, H. H. Huang, and D. Chyr, ÃÂ¢Ãâ¬Ãâ¢TheChinese text categorization system with association rule andcategory priority,ÃÂ¢Ãâ¬Ãâ Expert Syst. Appl., vol. 35, no. 1/2, pp. 102–110,Jul./Aug. 2008.
S. Bechhofer et al., OWL Web Ontology Language Reference,W3C recommendation, vol.10, p.2006-01, 2004.
B. Yildiz and S.Miksch, ÃÂ¢Ãâ¬Ãâ¢ontoX—A method for ontology driven information extraction,ÃÂ¢Ãâ¬Ãâ in Proc.ICCSA (3), vol. 4707,Lecture Notes in Computer Science, O. Gervasi and. L. Gavril ova,Eds., 2007, pp. 660–673, Berlin, Germany: Springer-Verlag.
“A Consistent Web Documents Based Text Clustering Using Concept Based Mining ModelÃÂ¢Ãâ¬Ãâ”,V.M.Navaneetha kumar,Dr.C.Chandrasekar.
N.Arunachalam, E.Sathya, S.Hismath Begum andM.UmaMakeswari, ÃÂ¢Ãâ¬Ãâ¢An Ontology based Framework for R&DProject Selection,ÃÂ¢Ãâ¬ÃâJ.Inf. Sci. and Technology, vol. 5, no. 1,February, 2013.