Protecting Streaming Data Using Provenance
Spread Spectrum Watermarking

Ramya K P; Revathi M K

Protecting Streaming Data Using Provenance Spread Spectrum Watermarking

Ramya K P, Revathi M K
Information Technology, Anna University, Dr.Sivanthi Aditanar College of Engineering, Tuticorin-628215, Tamilnadu, India

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering

Abstract

Large number of application areas, like location-based services, transaction logs, sensor networks is qualified by uninterrupted data stream from many. Chasing of data provenance in extremely active circumstance is a crucial requirement, because data provenance is a key component in appraising data trustiness which is important for lots of application. Provenance handling of continuous data needs to cover various issues, admitting the storage efficiency, processing throughput, bandwidth conception and secure transmission. This paper addresses the challenges by providing secure and efficient transmission of provenance along with sensor data by embedding it over the inter packet delays (IPDs). The embedding of provenance within a host medium makes this technique reminiscent of watermarking. Spread-spectrum based watermarking technique is proposed, that avoids data degradation due to traditional watermarking. Provenance is extracted effectively based on an optimal threshold mechanism that minimizes the probability of provenance decoding error. The outcome of the observation depicts that this system is scalable and highly resilient in provenance recovery versus several attacks up to specific level.

KEYWORDS

Streaming Data, Water Marking, Provenance Security, Sensor Network, Malicious Attack, Spread Spectrum Watermarking.

I. INTRODUCTION

Many applications process high volumes of streaming data. Examples include Internet traffic analysis, sensor networks, Web server and error log mining, financial tickets and on-line trading, real-time mining of telephone call records or credit card transactions, tracking the GPS coordinates of moving objects, and analysing the result of scientific experiments. In general, a data stream is a data set that is produced incrementally over time, rather than being available in full before its processing begins. Of course, completely static data are not practical, and even traditional databases may be updated overtime. A large network contains thousands of routers and links, and its core links may carry many thousands of packets per second; in fact, optical links i the Internet backbone a reach speeds of over 100 million packets per second. The traffic flowing through the network is itself a high-speed data stream, with each data packet containing fields such as a timestamp, the source and destination IP addresses, and ports. Other network monitoring data streams include real-time system and alert logs produced by routers, routing and configuration updates, and periodic performance measurements. Examples of performance measurements are the average router CPU utilization over the large five minutes and the number of inbound and outbound packets of various types over the last five minutes. Understanding these data stream is crucial for managing and troubleshooting a large network. However, it is not feasible to perform complex operations on high-speed streams or to keep transmitting Terabytes of raw data to a data management system. Instead, to need scalable and flexible end-to-end data stream management solutions, ranging from real-time low latency alerting and monitoring, ad-hoc analysis and early data reduction on raw streaming data, to long-term analysis of processed data.

A digital watermark is a digital signal or pattern inserted into a digital image. Since this signal or pattern is present in each unaltered copy of the original image, the digital watermark may also serve as a digital signature for the copies. A given watermark may be unique to each copy (e.g. to identify the intended recipient), or be common to multiple copies (e.g. to identify the document source). In either case, the watermarking of the document involves the transformation of the original into another form. This distinguishes digital watermarking from digital fingerprinting, where the original file remains intact and a new created file 'describes' the original file's content.

Digital watermarking is also to be contrasted with public-key encryption, which also transform original files into another form. It is a common practice nowadays to encrypt digital documents so that they become un-viewable without the decryption key. Unlike encryption, however, digital watermarking leaves the original image (or file) basically intact and recognizable. In addition, digital watermarks, as signatures, may not be validated without special software. Further, decrypted documents are free of any residual effects of encryption, whereas digital watermarks are designed to be persistent in viewing, printing, or subsequent re-transmission or dissemination.

II. SPREAD SPECTRUM WATERMARKING

Spread spectrum is a transmission technique by which a narrowband data signal is spread over a much larger bandwidth so that the signal energy present in any single frequency is undetectable. In our context, the sequence of inter packet delays is the communication channel and the provenance is the signal transmitted through it. Provenance is spread over many IPDs such that the information present in one IPD (i.e., container of information) is small. Consequently, an attacker needs to add high amplitude noise to all of the containers in order to destroy the provenance. Thus, the use of the spread spectrum technique for watermarking provides strong security against different attacks. To have adopted the direct sequence spread spectrum (DSSS) technique which is widely used for enabling multiple users to transmit simultaneously on the same frequency range by utilizing distinct pseudo noise sequences [9]? The intended receiver can extract the desired user’s signal by regarding the other signals as noise-like interferences. The components of a DSSS system are as follows:

Input:

III. PROVENANCE WATERMARKING

There are two main steps in our algorithm, which are described as follows. Provenance Encoding: This step works in three phases: Generation of Delay Perturbations, Selection of a Delay Perturbation and Provenance Embedding. Provenance Decoding: This step works in two phases: Reordering IPDs, Threshold-Based Decoding.

3.1 Provenance Encoding

Fig.1. represents an overview of our approach for provenance encoding at a sensor node in the data path and decoding at the BS. The process a node ni follows to encode a bit of PN sequence over an IPD is summarized below

3.1.1 Generation of Delay Perturbation

As the first step to embed provenance, a node ni generates a delay sequence that is used for watermarking. The PN sequence ÃÂ¯ÃÂ¿ÃÂ½ÃÂ¯ÃÂ¿ÃÂ½ÃÂ¯ÃÂ¿ÃÂ½ÃÂ¯ÃÂ¿ÃÂ½ÃÂ¯ÃÂ¿ÃÂ½ÃÂ¯ÃÂ¿ÃÂ½ and impact factor ÃÂ¯ÃÂ¿ÃÂ½ÃÂ¯ÃÂ¿ÃÂ½ÃÂ¯ÃÂ¿ÃÂ½ÃÂ¯ÃÂ¿ÃÂ½ are used for this purpose. The PN sequence, consisting of a sequence of +1 and -

3.1.2 Selection of Delay Perturbation

3.1 Provenance Decoding

An overview of our approach for provenance decoding at the receiver is shown in Fig.2. The process a node ni follows to decode a bit of PN sequence over an IPD is summarized below:

3.2.1 Reordering the IPDs

After calculating the T*, that is used for provenance retrieval purpose. As already told that the fingerprint image is considered as sensor data, the matrix value that is calculated from the image is reordered. The reordered data is then converted into fingerprint image as a result.

IV. EXPERIMENTAL RESULTS

All experiments are performed on a Desktop PC with Intel Duo Core 1.7 GHz CPU, 2G Ram and Windows XP operating system. Programs and codes are implemented in VB.Net. The sensor data was gathered from the sensor device and it was taken for further process. Here the finger print device is considered as sensor device and the captured finger print image is considered as sensor data. After capturing the finger print image, it was converted into matrix format and stored in database. The nodes that participated in data transmission were connected in network. The delays are generated and it was assigned to sensor data in random. The sensor data was send from one node to another according to assigned delays. Provenance Embedding at the receiver is shown in Fig.3. In the receiver side the data was received and it is stored along with the received time. Then it is decoded to get the original sensor image. Fig.4. shows Provenance Decoding at the receiver.

V. CONCLUSION

Interpacket timing based network flow watermarking has been widely used to identify the correlated traffic flows and to detect the source of attack behind the stepping stone(s). Our approach address the novel problem of securely transmitting provenance for data streams. We propose a spread-spectrum watermarking-based solution that embeds provenance over the interpacket delays. Spread spectrum technique is used so that it makes watermark delays much smaller. The decoding process does not requires the IPDs to be stored in database. The security features of the scheme make it able to survive against various sensor network or flow watermarking attacks. With the capability of capturing data packets and interpacket timing characteristics, an outside attacker may try to disrupt provenance security in different ways. In Provenance Detection and Retrieval attack, an attacker might want to identify and extract the provenance embedded by a node. Several attacks have been devised to detect and corrupt the active timing-based watermark in network flows. In our scheme, the watermarked IPDs do not follow any regular pattern. Thus our watermarking scheme show the robustness and makes the embedded provenance invisible to most of the attacks.

References

Chong S, Skalka C, and Vaughan J A, “Self-Identifying Sensor Data,” Proc. Information Processing in Sensor Networks (IPSN), pp. 82-93, 2010.
Cox I and Miller M, “Electronic Watermarking: The First 50 Years,” Proc. IEEE Workshop Multimedia Signal Processing pp. 225- 230, 2001.
Dixon R C, Spread Spectrum Systems: With Commercial Applications, third ed. John Wiley and Sons, Inc., 1994.
Hasan R, Sion R, and Winslett M, “The Case of the Fake Picasso:Preventing History Forgery with Secure Provenance,” Proc. Conf.File and Storage Technologies (FAST), pp. 1-14, 2009.
Houmansadr A, Kiyavash N, and Borisov N, “Multi-Flow Attack Resistant Watermarks for Network Flows,” Proc. IEEE Int’l Conf. Acoustics, Speech and Signal Processing, pp. 1497-1500, 2009.
Kiyavash N, Houmansadr A, and Borisov N, “Multi-Flow Attacks against Network Flow Watermarking Schemes,” Proc. USENIX Conf. Security Symp., pp. 307-320, 2008. Cabuk S, “IP Covert Timing Channels: Design and Detection,” Proc. ACM Conf. Computer and Comm. Security (CCS), pp. 178-187, 2004.
Lim H, Moon Y, and Bertino E, “Provenance-Based Trustworthiness Assessment in Sensor Networks,” Proc. Workshop Data Management for Sensor Networks, pp. 2-7, 2010.
National Cyber Security Research and Development Challenges, Related to Economics, Physical Infrastructure and Human Behavior, 2009.
Peng P, Ning P, and Reeves D S, “On the Secrecy of Timing- Based Active Watermarking Trace-Back Techniques,” Proc. IEEE Symp. Security and Privacy (SP), pp. 334-349, 2006.
Simmhan Y L, Plale B, and Gannon D, “A Survey of Data Provenance in E-Science,” SIGMOD Record, vol. 34, pp. 31-36,2005.
Syalim A, Nishide T, and Sakurai K, “Preserving Integrity and Confidentiality of a Directed Acyclic Graph Model of Provenance,” Proc. Working Conf. Data and Applications Security and Privacy, pp. 311-318, 2010.
Vijayakumar N and Plale B, “Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering,” Provenance and Annotation of Data, vol. 4145, pp. 46-54, 2006.
Wang X and Reeves D S, “Robust Correlation of Encrypted Attack Traffic Through Stepping Stones by Manipulation of Interpacket Delays,” Proc. ACM Conf. Computer and Comm. Security (CCS), pp. 20-29, 2003.
Wang X, Chen S, and Jajodia S, “Network Flow Watermarking Attack on Low-Latency Anonymous Communication Systems,” Proc. IEEE Symp. Security and Privacy (SP), pp. 116-130, 2007.