Safety Platform of Traffic in Cloud
Computing Environment

Chetan.R; Ranjith .J; Umesh.M; Usha N

Safety Platform of Traffic in Cloud Computing Environment

Chetan.R, Ranjith .J, Umesh.M, Usha N
Assistant Professor, Dept. of ISE, SJBIT, Bangalore, Karnataka, India

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering

Abstract

The ever-increasing road traffic collisions, improved gathering, analyzing and tackling constructive information in view of traffic safety has become one of most vital concerns for local traffic authorities. The ultimate goal is to guarantee sound traffic management and rapid access to updated road traffic information for city travelers. Therefore, an effort to devise a traffic safety information platform based on cloud computing technology, which serves as an outstanding channel for message between traffic authority, staff and the public. This platform relies on data gathering & storing using Big data methodologies, analyzing the data using Data warehousing, and mining model, eagerly personalized to present traffic environment and trustworthy data from traffic accidents. The platform is completely integrated with matching variables such as people, vehicles, and environment, weather and road conditions.

Keywords

data mining, cloud computing, data warehousing, clustering

INTRODUCTION

Overview of traffic safety

According to the statistics published by World Health Organization (WHO), annual road traffic deaths and injuries around the world has reached 1.2 million and 50 million respectively, which regularly resulted in everlasting damages. With the modern development of information technology and the eternally growing threat to life from road traffic collisions, it has become one of most vital concerns for local traffic authorities to better gather, analyze and tackle helpful information in view of traffic safety with a vision to ensure sound traffic management and for city travelers to get speedy access to updated road traffic information in the expect of guiding their travel behaviors as well [2]. Therefore, an endeavor to design a traffic safety information management platform based on cloud computing, data warehousing and data mining, which serves as an outstanding channel for communication between traffic authority, staff and the public. It aims to obtain full benefit of road safety information database in effort to simplify, judge and predict different safety information such as traffic accident black spots, vehicle types legally responsible to serious accidents and weather based on already occurred accidents on one hand and share detailed traffic conditions without hindrance such as road condition, potential risks, disasters and weather [21].

Data Warehouse

Data warehouse is a database used for reporting and data analysis. Integrating data from one or more contrasting sources creates a middle storehouse of data. It store existing and historical data and are used for creating, trending reports for senior management reporting such as yearly and quarterly comparisons. The data stored in the warehouse is uploaded from the operational systems. The typical extract transform load (ETL) based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the dissimilar source data systems. The integration layer integrates the dissimilar data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchical groups often called dimensions and into facts and aggregate facts. The access layer helps users retrieve data. A data warehouse constructed from an integrated data source system does not require ETL, staging databases, or operational data store databases. This integrated data warehouse architecture supports the drill down from the aggregate data of the data warehouse to the transactional data of the integrated source data systems. The data warehouse focuses on data storage. The main source of the data is cleaned, transformed, catalogued and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary, business intelligence tools, tools to extract data into the repository, and tools to manage and retrieve metadata are also considered important components of a data warehousing system.

Data Mining

Data mining is the computational procedure of discovering patterns in huge data sets involving methods at the intersection of artificial, machine learning, statistics, and database systems. The general objective of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. apart from the raw analysis stage, it involves database & data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. The real data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unfamiliar remarkable patterns such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). These patterns can then be seen as a kind of summary of the input data and may be used in further analysis.

Data mining involves six common classes of tasks

• Anomaly detection (Outlier/change/deviation detection) – The detection of unusual data records, that might be interesting or data errors that require additional investigation.

• Association rule learning (Dependency modeling) – Searches for relationships between variables.

• Clustering – is the job of discovering groups and structures in the data that are in some way or another "similar", without using recognized structures in the data.

• Classification – is the task of generalizing known structure to apply to new data.

• Regression – attempts to find a function which models the data with the least error.

• Summarization – providing a more dense representation of the data set, including visualization and report generation.

LITERATURE SURVEY

Hadoop [22] was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open source web search engine, itself a part of the Lucene project. Mike Cafarella and Doug Cutting estimated a system supporting a billion page index would cost around half a million dollars in hardware, with a monthly running cost of $30,000.

Jana and Bandyopadhyay [3] analyzed several security threats and its measures, and suggested additional recommendations on top of best practices adopted for Identity and Access Management(IAM) for mobile cloud users. Pirker et al [4] presented privacy preserving cloud resource payment to enable mobile clients to anonymously consume resources of a cloud service provider such that the provider is not able to track users' activity patterns.

Satyanarayanan [5] discussed a wide range of issues in areas such as privacy, software licensing, and business models with the emergence of cloudlet based hardware/software ecosystem to support for example, cognitive assistance for attention challenged mobile users, scalable crowd sourcing of first person video, and ubiquitous mobile access to one's legacy world.

Abuelela and Olariu[6] envision VANETs as cloud computing networks as well. Vehicles will share the capability of computing power, Internet access and storage to form conventional clouds.

Santos et al. [7] proposed a new platform to achieve trust in conventional clouds. Krautheim [8] also proposed a third party to share the responsibility of security in cloud computing between the service provider and client, decreasing the risk exposure to both. The bilinear aggregate signature has been extended to simultaneously audit multiple users.

Observing the fact that many devices (computing, sensing and storing devices) on vehicles are idle for a long while, Olariu and his colleagues [9], [10], [11] proposed to share these devices as the computational engine of the cloud. Ristenpart et al. [12] presented experiments of locating co-residence of other users in cloud virtual machines.

The mobile cloud application models are based on the standard cloud service model that includes Infrastructure as a Service (IaaS) [14], Platform as a Service (PaaS) [15], and Software as a Service (SaaS) [16], [17]. Therefore, based on the working of the application models, any of these service layers can be utilized. Some of the well known services for mobile cloud computing include Amazon Elastic Compute Cloud (EC2) [18], GoogleApp Engine [15], and Microsoft Azure [19].

THE TECHNOLOGIES AND CONCEPTS

Cloud Computing

Cloud Computing can be used as a platform and one type of application program abreast of the latest computing development such as distributed computing, parallel computing and grid computing. It uses a computer network to offer computing resources such as data or software as a service to users who pay for it on demand. Thus, it breaks free of the computing power and storage space margins of the traditional local computing model. User’s computers, mobile phones, and other private devices might only contain an operation system and an Internet explorer, and they do not have to to identify where the data is stored or who offer the software. Users present their computing tasks that cannot be skilled by their local devices to the clouds. The clouds offer computing services and return the computing results to them.

Cloud computing benefits from several key characteristics

• Reliability The data is stored and the applications are running on the servers in the clouds. Users do not have to worry about lost or corrupt data.

• Agility The clouds can distribute computing resources according to the user’s needs or preferences to provide flexible management.

• Utility. Users do not have to buy expensive computing devices. They only need to pay for the computing services provided by the clouds.

• Application programming interface (API) accessibility to software that enables machines to interact with cloud software in the same way that a traditional user interface facilitates interaction between humans and computers. Cloud computing systems typically use Representational State Transfer (REST) based APIs.

• Device and location independence facilitate users to access systems using a web browser despite of their location or what device they use. As infrastructure is off site and accessed via the Internet, users can connect from anywhere.

• Multitenancy enables sharing of resources and costs across a large pool of users thus allowing for Centralization of infrastructure in locations with lower costs, Peak load capacity increases & Utilisation and efficiency improvements for systems that are often only 10–20% utilised.

• Performance is monitored and consistent and loosely coupled architectures are constructed using web services as the system interface.

• Security can get better due to centralization of data, improved security focused resources, etc., but concerns can stick with about loss of control over definite perceptive data, and the lack of security for stored kernels.

Analyzing the Data with Hadoop

Hadoop

As infrastructure of distributed system, Users can explore the distributed programs and make full use of cluster of highspeed process and storage without shrewd details about the distributed architecture substrates [22]. Hadoop is a group of associated subprojects that descend beneath the umbrella of infrastructure for distributed computing. These projects are hosted by the Apache Software Foundation, which provides support for a community of open source software projects. To take advantage of the parallel processing that Hadoop provides, it is essential to express query as a Map Reduce job. After some local, small-scale testing will be able to run it on a cluster of machines.

Map and Reduce

Map Reduce is a programming model for data processing. The model is simple, yet not too simple to express useful programs in. Map Reduce works by breaking the processing into two phases: the map phase and the reduce phase. Each phase has key-value pairs as input and output, the types of which may be chosen by the programmer. The programmer also specifies two functions: the map function and the reduce function. The input to our map phase is the raw data, which choose a text input format that gives each line in the dataset as a text value. The key is the offset of the start of the line from the beginning of the file. The map function is just a data preparation phase, setting up the data in such a mode that the reducer function can do its work on it. The map function is also a good place to drop bad records. As can be seen from the Fig. 1., it contains a lot of components.

Avro

A data serialization system for efficient, cross language RPC, and relentless data storage. (At the time of this writing, Avro had been created only as a new subproject, and no other Hadoop subprojects were using it yet.)

HDFS

Hadoop consists of distributed filesystem called HDFS, which stands for Hadoop Distributed Filesystem. A distributed filesystem that runs on huge clusters of product machines. HDFS is a filesystem intended for storing very large files with streaming data access patterns, running on clusters on commodity hardware.

Pig

A data flow language and execution environment for exploring very big datasets. Pig runs on HDFS and Map/Reduce clusters.

HBase

HBase is an open source, non-relational, distributed data base modeled after Google's BigTable and is written in Java. HBase uses HDFS for its fundamental storage, and supports both batch style computations using Map/Reduce and point queries (random reads).It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on apex of HDFS, providing BigTable like capabilities for Hadoop.

ZooKeeper

A distributed, highly available coordination service. ZooKeeper provides primitive such as distributed locks that can be used for building distributed applications.

Hive and Mahout

Hive is a data warehouse platform based on Hadoop. By utilizing Hive, data extraction, transformation and loading can easily realize. Furthermore, it’s potential to convert the program from QL to corresponding Map/reduce and base them on Hadoop by HQL (Hive Query Language). Mahout is a distributed framework of machine learning and data mining provide some traditional algorithm in the extensible ground of machine learning.

PROBLEMS AND CONSTRUCTION MODE

Problems

Public traffic information service has been in the interest worldwide. But, there is still a batch of weaknesses in this area:

• Limited information service ability.

At present the majority information services rely on government and information gather, tackling & releasing are self-governing to several extent. What’s added information content seems to be unsophisticated and unusable.

• Drawback of service content, approach, eminence and range.

The present service content chiefly touches upon travel information and service approach is considered to be obsolete and thus put down much room for development.

• Lack of frequently updated traffic safety information.

At present time, the significant traffic safety information is partial for information service provider & it’s really sturdy congregation, storing and processing group information.

For the given problems above, an attempt to generate a traffic safety information platform based on cloud computing for local traffic authorities and the public through establishing traffic safety information system. With the platform any users can attach cloud service platform with their own terminal apparatus and keep them well educated of any traffic safety information they are concerned about.

The main research content

An effort to design a traffic safety information management platform based on cloud computing, data warehousing and data mining, which serves as an outstanding feed for communication between traffic authority, workers and the public [23]. It aims to take complete benefit of road safety information database in effort to simplify, critic and envisage different safety information such as traffic accident black-spots, vehicle types liable to severe accidents and climate based on previously occurred accidents on one hand and collect, deal with and divide complete traffic circumstances without delay such as road condition, potential risks, accidents, disasters and weather on the other hand. The key problem requests tackling.

There are three key technologies:

• Cloud storage of huge traffic data

• precise analysis and proposal of massive traffic information

• Cloud terminal technology

Support different mobile terminal including mobile phones, ipad and personalized cloud terminal in various interactive forms so that users can use cloud services anytime and anywhere and distribute traffic safety information.

Architecture design of intelligent traffic safety information platform based on cloud computing

Based on the responsiveness and understanding of above problem, attempt to design the following system structure. Figure 2. Briefly explained as follows. Those bunch data gathered by each branch system grant support for isolating users in the field of traffic safety. In most cases driver’s casual behaviors often result in uneven pattern, which is further suitable for cluster analysis.

Due to the large scale of user data, we choose to take up Hadoop for parallel computing and distributed storage. Moreover, to advance the consumption rate of resources we have realized the virtualization of server layer, built traffic information security awareness based on all kinds of data mining algorithm with Mahout and blueprint mobile terminal access for solving all kinds of cloud terminal access in the application layer.

Algorithm usage case

There is no extensively accepted effective algorithm on report of incomplete statistical data so far. It is an attempt to confine influencing factors of road safety in a grey realm by adopting clustering algorithm [24] and estimate overall level of road traffic safety based on information screening, processing, annex and extension.

The procedures will be as follows:

• Format conversion: Data format have to be transformed into the input format that can be handled by clustering algorithm. What clustering algorithm can deal directly with the format is sequence file in Mahout. So we should write a Class which is used to convert the format into sequence file file:

InputMapper and a Map function to implement InputMapper.

Map function is defined as follows:

“Public void Map (LongWritable key,

Text values,

OutputCollector<Text, VectorWritable>Output,

Reporter reporter)”

• Invoke Apache Mahouts

• Invoke parallel clustering algorithm of Mahout, which contains a couple of clustering algorithm. K - means clustering algorithm is a case in point.

• Obtain the clustering results of HDFS

• Analyze the clustering results, namely obtain the results directly in HDFS by Hive and extract them to local client for analysis.

CONCLUSION

It is not so simple to begin from graze and thus call for additional analysis and exploration. Its purpose in the traffic field also depends on the expansion of related areas such as IT and logistics. Due to its potential theoretical and practical values in the field of traffic management, cloud computing has aroused wide concern among researchers. The expansion and application of cloud computing in different vital industries can also make possible its progress in traffic field.

Tables at a glance

Table 1

Figures at a glance


Figure 1	Figure 2

References

Shi J., Li, X. (2011). Research on Traffic Information Cloud Computing and Its Application. Journal of Transportation Systems Engineering and Information Technology,(01):179-184.

Zeng, K., Yan, J. (2011). Cloud Computing and Its Application in Intelligent Transportation. Modern Science &Technology of Telecommunication,(05):45-51.

Debasish Jana and Debasis Bandyopadhyay, "Management of Identity and Credentials in Mobile Cloud Environment", Proceedings of the 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS 2013), September 28-29, 2013, Bali, Indonesia

Martin Pirker, Daniel Slamanig, and Johannes Winter. 2012. Practical privacy preserving cloud resource-payment for constrained clients. In Proceedings of the 12th international conference on Privacy Enhancing Technologies (PETS'12), Simone Fischer-Hübner and Matthew Wright (Eds.). Springer-Verlag, Berlin, Heidelberg, 201-220.

Mahadev Satyanarayanan. 2013. Cloudlets: at the leading edge of cloudmobile convergence. In Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures (QoSA '13). ACM, New York, NY, USA, 1-2.

M. Abuelela and S. Olariu, “Taking vanet to the clouds,” Proceedings of The 8th International Conference on Advances in Mobile Computing and Multimedia MoMM 2010, pp. 8–10, 2010.

N. Santos, K. P. Gummadi, and R. Rodrigues, “Towards trusted cloud computing,” in Proceedings of HotCloud, June 2009.

F. J. Krautheim, “Private virtual infrastructure for cloud computing,” 2009 conference on Hot topics in cloud computing, pp. 1–5, 2009.

M. Abuelela and S. Olariu, “Taking vanet to the clouds,” Proceedings of The 8th International Conference on Advances in Mobile Computing and Multimedia MoMM 2010, pp. 8–10, 2010.

M. Eltoweissy, S. Olariu, and M. Younis, “Towards autonomous vehicular clouds,” in Proceedings of AdHocNets’2010, Victoria, BC, Canada, August 2010.

S. Olariu, I. Khalil, and M. Abuelela, “Taking vanet to the clouds,” International Journal of Pervasive Computing and Communication, vol. 7, no. 1, pp. 7–21, 2011.

T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, you, get off of my cloud: exploring information leakage in thirdparty compute clouds,” in Proceedings of the 16th ACM conference on Computer and communications security, ser. CCS ’09, 2009, pp. 199–212.

Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. 2009. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In Proceedings of the 16th ACM conference on Computer and communications security (CCS '09). ACM, New York, NY, USA, 199-212.

Rackspacecloud. http://www.rackspace.com/cloud,Accessed April 10th, 2012.

Google app engine. https://cloud.google.com/products/app-engineappengine-search Accessed November 15th, 2011.

Google apps for business. https://www.google.com/enterprise/apps/business/products.html Accessed April 10th, 2012.

Salesforce. http://www.zdnet.com/why-salesforce-integrator-rework-is-recasting-itself-as-a-cloud-broker-7000027448/ Accessed April 10th, 2012.

Amazon elastic compute cloud (ec2), https://www.google.co.in/url. Accessed December 10th, 2011.

Microsoft azure. http://www.localmoxie.com/web.php Accessed April 10th, 2012. [Online].

Basics About Cloud Computing, Grace Lewis, September 2010.

Zheng, X. (2012). Application of Cloud Computing in the Intelligent Transportation System in the Future. Guangxi Journal of Light Industry,(03):88-89.

White, T. (2011). Hadoop: The Definitive Guide. : 30-32 About Mahout https://cwiki.apache.org/MAHOUT/quickstart.html

Gray Evaluation Method Based on the Relative Merits of the Clustering of Road Traffic Safety. Road Traffic & Safety. 2006(11)

Jaworski, P., Edwards, T., Moore, J., Burnham, K. (2011). Cloud Computing Concept for Intelligent Transportation Systems. 2011 14th International IEEE Conference on Intelligent Transportation Systems. 2011: 931-936