ISSN ONLINE(2319-8753)PRINT(2347-6710)

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Data Mining Techniques with a Case Analysis Using Clementine

Sathish.S.N
Project Manager, Infosys Limited, Mysore, India
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Science, Engineering and Technology

Abstract

This paper aims to explain the concept of Data Mining features by taking a case study/project for analysis.

Keywords

Data Mining; Data Warehouse; Clementine; Association Rule; SPSS

INTRODUCTION

Introduction to Case:
This case is about analyzing crimes which was recorded a stored in a data file. Objective of the case analysis is to find the following:
Explore data and explain patterns in “volume” crimes
Find related crimes which seem to have been committed by the same offender
Find related crimes even if they are widely distributed in time and geography
Introduction to Data:
The fictional crime reports used for the demo consist of 662 crimes (rows) which has taken place in a city. Each record contains 46 fields. These data are available in a .csv file. Let’s look at the fields which will go through analysis.
image
image
Crime Report Information
o Ref no., date, time, day of week, grid reference of location
Modus Operandi
o Method of entry, point of entry, security features, method of dealing with alarm, what the offender posed as etc.
Property stolen or damaged
o Audio equipment, video equipment, computer, purse etc.
Other
o Home Office code, short description

STREAM

Streams are created by drawing diagrams of data operations relevant to your business on the main canvas in the interface. Each operation is represented by an icon or node, and the nodes are linked together in a stream representing the flow of data through each operation.
Based on the case, we have designed the below stream:
image
Let’s do the analysis based on the above stream to derive some information from data which is provided. Lets first have look at all the nodes used in the above stream and find out their usage.
image
Analysis:
1. Figure 1 and 2 below shows the percentage (%) and the count of various types of crimes done from the dataset.
image
From the above diagram we can conclude that Crime like “Burglary Dwelling” do happens more often. This represents % of occurrence and the number of occurrence of each crime from the dataset. Below figure (Fig-2) also represents the same information as above.
image
Based on the output, we can say the maximum crime happens in Day2 and Day4. But, it is not very significant. Then, let us find what are the types of crime happens on those days and also, which crime is significant on that day.
3. Below figure (Fig-4) represents the percentage of crimes occurring on each day. We observe that on day maximum number of crimes.
image
From the above, we found that on Day2 and Day4 the ‘Theft from Pick Pocketing’ crime is happening. It leads one more question ‘what are events which happens on those day i.e Day2 and day4 which leads to the crimes?”
4. Below figure (Fig-5) represents the occurrence of crimes month wise and observation gives outcome as 6th month we do see more crimes.
image
5. Below figure (Fig-6) represents the occurrence of various crimes month wise and observation gives outcome as 6th month we do see more crimes.
image
6. Below figure (Fig-7) represents the occurrence of various crimes location wise. In the data file location is provided by using coordinate values i.e. X Grid and Y Grid.
image
7. Below figure (Fig-8) represents the occurrence of various crimes based on the timing i.e. ranging from 1 to 2399(Representing 24 Hrs of the Day).
image
8. Below figure (Fig-9) represents the occurrence of various crimes based on the timing between 500 to 700. Here we have done slicing based on Fig-8 analysis. Here we have taken the help of “Select” Node to filter specific records.
image
9. Below figure (Fig-10) represents the various crimes which has occurred during month number 6. Here we have taken the help of “Select” Node to filter specific records for month 6.
image
10. Below figure (Fig-11) represents the various crimes which have occurred during month number 6 and on 5th Day.
image
11. Below figure (Fig-12) represents the way to the filtering criteria which is going to pick up the records, which has the time field value between 500 and 700. This is used in the analysis number 8.
image

CONCLUSION

As we have said at the beginning that, from a huge dataset, how quickly we can be able to find out information which can help us in doing better planning. This tool do have very good features to do various kind of analysis. Here we have taken only those nodes or functionality in to consideration, which was required for this case.

References

  1. Building the Data Warehouse William H. Inmon
  2. The Data Warehouse Toolkit Ralph Kimball
  3. www.spss.com/clementine
  4. Data Mining Techniques by Arun k. Pujari