The aim of study presents a graduate employability model that uses Bayesian methods to search the most important factor of graduate employability, and to compare the accuracy of each algorithm under Bayesian methods including Naïve Bayesian Simple, Naïve Bayesian, Averaged One-Dependence Estimators, Averaged One- Dependence Estimators with subsumption resolution, Bayesian networks, and Naïve Bayesian Updateable. The results show that 3 factors with a direct effect on employability are the work province, occupation type, and times find work.
Keywords |
Bayesian methods, Classification techniques, Data-mining, Graduate employability |
INTRODUCTION |
According to the report by the Impact of Economic Crisis on Higher Education in 2012 indicated that The Thai
higher education system is facing serious problems of graduate unemployment crisis. There were 320,815 graduates
with bachelor’s degrees and above in 2006. In 2007, the number of graduates increased to 371,982. About 75.02 per
cent of graduates without those from open universities were employed in 2006 and 18 percent of graduates were
unemployed. After that, the proportion of employed graduates decreased to 68.65 percent in 2008 and unemployment
increased to 28.98 per cent. Effecting to graduates employment, graduates are likely to have more difficulties in finding
jobs. This crisis is explicitly affecting to community, social and the nation. For these reason, it is necessary to require
in-depth studies that enable identify the factors underlying graduates employability to be accurately identified so that
more effective measures and standards can be implemented [1]. |
Classification approach is one of the most important data-mining especially in the area of the predicting. The
capability of classification approach not only handles with huge volume of data to discover hidden patterns and
relationships helpful in decision making, but also reduce their flexibility the data-generation structure, irrespective of
complexity, great predictive and in some case, interpretative potential. |
Therefore, the aim of this study is to compare the accuracy of classification model under Bayesian methods for
identifying the factors underlying graduate employability. This useful information may provide in-depth data valuable
for the ministry of education to monitor and improve various aspects of an administrative system in the institutions of
higher education. |
RELATED WORK |
The Economist Intelligence Unit (2012) reveals that Thai universities are lacking of manufacturing graduates that
possess good language skills, technical and information technology skills. Moreover, both of employer and employees
reveals that the gaps in generic behavior skills consisting communication, leadership, social skills, time management,
teamwork, and adaptability [2]. |
Yilmaz (2010) reveals six major considerations for hiring firms in Thailand, which are interpersonal skills, technical
skills, Educational level, ethnic quotas, loyalty, and experience, whereas more than one fifth of all survey participants
Thai firms noted technical skills as a major consideration in their hiring decisions [3]. Komintarachat (2012) reveals four criteria in term of BE concerned by BE graduates of Assumption University batch thirty-eight, graduate users of
BE graduates in Thai employment market, and the Commission of Higher Education’s five domains of learning form
the Thailand Qualifications Framework for Thailand’s higher education, these are degree of learning, usefulness of
skills, continuance and relevance respectively [4]. |
The research by the Council for Industry and Higher Education of United Kingdom reveals that employers observe
six competencies in people who can change the organization and add value to their careers [5]. These six competencies
are personal capabilities, cognitive skills or brainpower, personal capabilities, generic competencies, technical ability,
business or organization awareness, and practical elements. These competencies cover a set of achievements that
include the skills, understandings, and personal attributes that make graduates more likely to gain employment and
become successful in their selected occupations. This advantages the graduates, the community, and also the economy. |
Moreover, data mining techniques have been used in the education domain for predicting and classifying. Research
by Minaei-Bidgoli et al have classified students to predict their final grade using six common classifiers: 1-nearest
neighbor (1-NN), the Quadratic Bayesian classifier, the Quadratic Bayesian classifier, k-nearest neighbor (k-NN),
Multilayer Perceptron, Parzen-window, and Decision tree methods [6]. |
Guruler et al [7] identified that individual student characteristics are associated with their success according to the
Grade Point Average (GPA) by using a Microsoft Decision Tree classification technique [8]. These studies have
revealed some applications of classification in data mining technique of educational domain that extracts useful
information from huge data sets. Data mining and analytical tools can help users’ access current information for the
decision-making process. |
This study we are therefore applied classification techniques to construct graduate employability for identifying the
factors underlying graduate employability from historical database and compare the accuracy of each models under
Bayesian methods, in order to search the real factor and relationship are accrued. |
METHODOLOGY |
This study developed the research methodology based on three phases of the data-mining techniques including data
preprocessing, classification task, and interpretation and evaluation. |
A. Data Preprocessing |
We collected raw data from the database of graduate historical in Khon Kaen University, Thailand for the academic
year of 2009. The data set consists of 3,090 instances and 17 attributes. We created the classifiers by using the Waikato
Environment for Knowledge Analysis (WEKA) program. This software was developed at the University of Waikato,
New Zealand, and it can be easily applied to the data set. The default file type for data analysis in WEKA is the
Attribute-Relation File Format file type, moreover, the data can also be imported in various formats such as CSV:
Comma-Separated Values (text file), CSV file, and so on. |
The data preprocessing phase includes two steps to prepare the data set for the classification task. The first step
cleans and eliminates data with missing values in significant attributes, removes duplicate data, and identifies outliers.
Then, we discrete the values of attributes into intervals with categorical or nominal attributes to prepare the data set for
the classification task. These discretized values can be described as follows: |
GENDER is transformed into a nominal value from its previous value as a code (1 or 2). |
GPA is transformed into a grade range from its previous value as a continuous number. |
MATCH_EDU is transformed into a nominal value from a previous value as a code (1 or 2). |
ADDPROGRAM 1, 2, 3, 4, and 5 values are transformed into nominal values from their previous values as
a code (1 or 2). |
WORK STATUS is transformed into a nominal value from its previous value as a code (1, 2, and 3). |
B. Classification Task |
In this part, we used classification technique which called Classification task. The classification task in this paper is
to construct graduate employability model and predict the employment status (working, not working, or other) for
graduate profiles. There are two stages in classification task consisting training and testing. The testing data set is used to estimate the predictive accuracy. The classify phase in WEKA includes four test modes as testing options: the
training set, supplied test set, cross-validation, and percentage split [9]. |
Training set: If we use this option as test, the test data will be sourced from training data, therefore, this
option will decrease reliable measuring of the true error rate. |
Supplied test set: this option, we can use test data which been prepared separately from the training data. |
Cross-validation: this option is appropriate for limited dataset wherewith the number of fold can be
determined by user. 10-fold class validation is widely used to get the best for measuring error. It has been
widely assay on numerous datasets with different learning techniques. |
Percentage split: this option is evaluated on how well it predicts a certain percentage of the data that are
held for testing. The amount of data held depends on the value entered in the % field. |
We selected hold-out validation method with 70-30 percentage split in order to avoid over fitting of data, whereby
70% out of the 2,327 instances is used for training while the remaining instances are prepared for testing. |
1) Bayesian Methods |
The classification task in these methods comprises classifying a class variable based on a set of attribute variables.
This is a type of statistical analysis in which the prior distribution is estimated from the data before any new data are
observed; thus, every parameter is assigned a prior probability distribution [10]. The Naïve Bayesian algorithm works
as follows: Let D be a training set of tuples and their associated class labels. As usual, each tuple is represented by an
n-dimensional attribute vector, X = (x1, x2, …, xn), depicting n measurements made on the tuple from n attributes,
respectively, A1, A2, … , An. Suppose that there are m classes, C1, C2, …, Cm. Given a tuple, X, the classifier predicts
that X belongs to the class with the highest posterior probability, conditioned on X; that is, the Naïve Bayesian
algorithm predicts that tuple X belongs to the class Ci if and only if P(Ci|X) > P(Cj|X) for 1 ≤ j ≤ m; j ≠ I . Thus, we
maximize P(Ci|X). The class Ci for which P(Ci|X) is maximized is called the maximum posteriori hypothesis. |
We performed this experiment with six algorithms under the Bayesian methods in WEKA: the Averaged One-
Dependence Estimators (AODE), Averaged One-Dependence Estimators with subsumption resolution (AODEsr),
Bayesian Network, Naïve Bayesian, Naïve Bayesian Simple, and Naïve Bayesian Updateable algorithms. |
The AODE and Naïve Bayesian algorithms were also used by Affendey et al. [11], and the remaining algorithms
were selected to compare the results of the Bayesian algorithm experiment using the same data set. The AODE
algorithm achieved the highest accuracy percentage in averaging all smaller searching-space in alternative Naïve
Bayes-like models, which have weaker, and hence, less detrimental independence assumptions than the Naïve Bayesian
algorithm. The resulting algorithm is computationally efficient and achieves highly accurate classification in many
learning tasks. The AODEsr algorithm complements the AODE algorithm with Subsumption Resolution, which is
capable of detecting specializations between two attribute values at classification time, and deletes the generalization
attribute value. Bayesian Network learning uses various search algorithms and quality measures. In the Naïve Bayesian
algorithm, numeric estimator precision values are chosen based on an analysis of the training data. The Naïve Bayesian
Updateable algorithm uses a default precision of 0.10 for numeric attributes when build classifier is called with zero
training instances. The Naïve Bayesian Simple modeled numeric attributes by a normal distribution. |
C. Interpretation and Evaluation |
In this part, we compared the performance under Bayesian methods. The AODEsr algorithm achieved the highest
accuracy of 98.3% using the graduate data set. The second highest accuracy was achieved using AODE algorithm with
an accuracy of 96.1%. |
RESULTS |
Table 1 shows the classification accuracies for various algorithms under the Bayesian method. This table provides
comparative results for the kappa statistics mean absolute error, root mean squared error, relative absolute error, and
root relative squared error of the 699 testing instances. The OADEer algorithm achieved a higher accuracy percentage
than other algorithms. |
In Fig. 1, it shows a comparison of the accuracy and root relative squared error from each algorithm under Bayesian
methods. This knowledge can be used to gain insights into the employment trend of graduates from local institutions of
higher learning. A chart display of the accuracy and root relative squared error all algorithms under Bayesian methods
reveal the accuracy of these approaches. The highest accuracy mean that the results in a better forecast. |
CONCLUSION |
In this study, we compare six algorithms under Bayesian methods on graduate dataset with some parameter. In
graduate dataset have a simple and class attribute. The results show that the AODEsr algorithm, achieved the highest
accuracy of 98.3%. The second highest accuracy was achieved using AODE algorithm with an accuracy of 96.1%. In
addition, the experiment show that 3 factors with a direct effect on employability are the work province, occupation
type, and times find work. |
ACKNOWLEDGMENTS |
B. Jantawan would like to express thanks Dr. Cheng-Fa Tsai, professor in the Management Information Systems
Department, National Pingtung University of Science and Technology, Taiwan for supporting the outstanding
scholarship, and highly appreciates to the Khon Kaen University, Thailand for giving the information of this study. |
Tables at a glance |
|
Table 1 |
|
Figures at a glance |
|
Figure 1 |
|
References
- United Nations Educational Scientific and Cultural Organization, âÃâ¬ÃÅThe Impact of Economic Crisis on Higher EducationâÃâ¬ÃÂ, UNESCO Bangkok, Asia and Pacific Regional Bureau, Bangkok: Thailand, 2012.
- Economist Intelligence Unit, âÃâ¬ÃÅSkilled Labor Shortfalls in Indonesia, the Philippines, Thailand, and Vietnam: A custom research report forthe British CouncilâÃâ¬ÃÂ.
- Yilmaz, Y., âÃâ¬ÃÅHigher Education Institutions in Thailand and MalaysiaâÃâ¬Ãâcan they deliver?,
- Komintarachat, H., âÃâ¬ÃÅDevelopment of a model for effective business English curriculum in an international university in ThailandâÃâ¬ÃÂ, Journalof Assumption University Thailand, Vol. 4, pp. 84-92, 2012.
- Rees, C., Forbes, P., and Kubler, B., âÃâ¬ÃÅIntroduction. Student Employability Profiles: A Guide for Higher Education PractitionersâÃâ¬Ã (2nd ed.,pp. 3), United Kingdom: The Higher Education Academy, 2006.
- Minaei-Bidgoli, B., Kashy, D. A., Kortemeyer, G., and Punch, W. F., âÃâ¬ÃÅPredicting student performance: An application of data mining methods with an educational Web-based systemâÃâ¬ÃÂ, 33rd Frontiers in Education Conference, pp. 13-18, 2003.
- Guruler , H., Istanbullu, A., and Karahasan, M., âÃâ¬ÃÅA new student performance analysing system using knowledge discovery in higher educational databasesâÃâ¬ÃÂ, Computers & Education, Vol. 55, pp. 247-254, 2010.
- Kumar, V. and Chadha, A., âÃâ¬ÃÅAn Empirical Study of the Applications of Data Mining Techniques in Higher EducationâÃâ¬ÃÂ, International Journal of Advanced Computer Science and Applications, Vol. 2, pp. 80-84, 2011.
- Kirkby, R. and Frank, E., âÃâ¬ÃÅWEKA Explorer User Guide for Version 3-4-3âÃâ¬ÃÂ, University of Waikato, 2004.
- Jaynes, E. T., âÃâ¬ÃÅProbability Theory: The Logic of ScienceâÃâ¬ÃÂ, United Kingdom: Cambridge University Press, 2003.
- Affendey, L. S., Paris, I. H. M., Mustapha, N., Sulaiman, M. N., and Muda, Z, âÃâ¬ÃÅRanking of influencing factors in predicting student academic performanceâÃâ¬ÃÂ, Information Technology Journal, Vol. 9, No. 4, pp. 832-837, 2010.
|