ISSN ONLINE(2320-9801) PRINT (2320-9798)

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Data Compression in Wireless Sensor Network: A Survey

Chetna Bharat Mudgule1, Prof. Uma Nagaraj2 and Prof. Pramod D. Ganjewar3
  1. PG Student, Dept. of Computer Engg. , Savitribai Phule Pune University, Pune, Maharashtra, India
  2. Professor, Dept. of Computer Engg. , Savitribai Phule Pune University, Pune, Maharashtra, India
  3. Assistant Professor, Dept. of Computer Engg. , Savitribai Phule Pune University, Pune, Maharashtra, India
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering


With increasing popularity and applicability of wireless sensor networks, the limitations which restrict the lifetime of network are also arising. It is observed that the energy utilized in transmission is more than processing the data in wsn. So, many approaches are proposed to tackle the problem of power consumption, used while transmission. One of the approaches include the compression of data to be transmitted and recovery of it at sink node in wsn. Under this, different data compression algorithms which are compatible for wireless sensor network are described which helps to save energy in wireless sensor network.


Key words

wireless sensor networks, power consumption, data compression, save energy.


Wireless sensor networks (WSNs) have gained worldwide attention in recent years, particularly with the proliferation in Micro-Electro-Mechanical Systems (MEMS) Technology which has facilitated the development of smart sensors [1].
WSN have ability to serve many major applications in situations which includes military Target tracking and investigation, natural disaster relief, biomedical health monitoring and hazardous environment exploration and seismic sensing.A wireless sensor network contains many sensor nodes which are organized over a geographical area for observing physical occurrences like temperature, humidity, vibrations, seismic events, and so on. A sensor node is tiny device containing the three important constituents: a sensing subsystem for data acquirement i.e. data acquisition from the physical environment, a processing subsystem for processing and storage of data, and communication subsystem for transmission of data. With these subsystems it also contains a power source which supplies energy to the device for working and a battery with inadequate energy budget. Thus, it is quite inconvenient to recharge the battery, as they are organized in an unrealizable surroundings. Sensor nodes are small, with limited processing and computing resources and thus they should have the long lifetime for serving the requirements of application [2].
Sensor nodes in the WSN and its components such as power source, processing and data storing memory are of small size to accomplish their requirement. As a large number of sensor nodes are situated at particular location which become hard to access and not feasible to perform maintenance operations such as changing batteries on them. This makes the sensor nodes to be resource constrained. They have limitations on power supply, bandwidth for communication, processing speed, and memory space. Thus it has given rise to many research works and studies which specifically concentrating on the maximizing the utilization of limited sensor resources.
A sensor node comprises of four main components : i) a sensing subsystem includes one or more sensors for data acquisition, ii) a processing subsystem includes a micro-controller and memory for local data processing, iii) a radio subsystem for wireless data communication, and iv) a power supply unit [3].In addition, depending on the purpose of specific application it may also include the components like location tracing system for locating the position, mobilizer for different alignments etc.
The communication subsystem has high energy consumption than computation subsystem. Even for transmitting the single bit may consume more energy than processing it. And also depending on the application requirement, the sensing subsystem can be used as another source of energy consumption and require to reduce its power consumption. Thus it’s important to adopt several approaches to reduce the power consumption in wireless sensor networks based on its architecture. And these approaches mainly include three important techniques i.e. duty cycling, data-driven approaches, and mobility. Here we are focusing on data-driven approaches.
Thus it’s important to adopt several approaches to reduce the power consumption in wireless sensor networks based on its architecture. And these approaches mainly include three important techniques i.e. duty cycling, data-driven approaches, and mobility. Here we are focusing on data-driven approaches.

A. Data-Driven Approaches

Data-driven describes the approach of reducing the number of sampled data while keeping the sensing accuracy within an acceptable range [4]. Depending on the problem data-driven approaches can be further divided into data reduction schemes and energy-efficient data acquisition schemes as shown in Fig. 2.
Data reduction schemes focus in the case of unneeded samples, whereas energy-efficient data acquisition mainly concentrates on reducing the energy utilized by sensing subsystem. Data reduction can be divided into different techniques namely a) in-network processing, b) data compression and c) data prediction. All these techniques have a goal of reducing the amount of data to be transmitted to the sink node [4].These are as follows:
a) In-network processing:
In-network processing includes in-processing data aggregation at intermediate nodes between the source and sink. Thus the amount of data is reduced while traversing through the network to the sink. Also depending on the specific application the most suitable in-network processing technique can be used.
b) Data Compression:
Data compression is used to reduce the amount of information or data transmitted by source nodes. It includes the encoding information at data generating nodes and decoding it at sink node. There are many data compression algorithm which are applicable for wireless sensor networks [6].
c) Data Prediction:
Data prediction includes the abstraction of sensed data which is considered as a model relating data evolution. The model has ability to predict the values sensed by sensor nodes within certain error limits, and exist at both sensors and sink. If the accuracy needs are satisfied then queries raises by users can be solved at sink through the model without the use of exact data of nodes. Thus, data prediction helps in reducing the number of information transmitted by source nodes and also the energy required for communication.
B. Data Compression in WSN
Data compression is process of representing the information in compact form. It include the encoding of data and decoding of it at sink for recovery by compacting the data. Thus the design of compression algorithm involves the grasping about the type of redundancy arising in data and to overcome it the applicability of most suitable strategy to represent it, in most appropriate compact form.
Data compression is considered as the field of resource utilization studies for sensor networks. Many researchers are looking for best way to compress the sensing data. So that it will helps in reducing the power consumption due to processing and transmitting data in each node, which in returns will extend the lifetime of sensor network and also less bandwidth will be required for sending and receiving data. Data compression is one of the effective method of utilizing limited resources of WSNs by using the compression algorithm to compress data before transmission, which in turn can reduce the overall power consumption. But in many cases it is observed that it may increase the power consumption due to the access of memory for execution of compression algorithm. Thus it is preferred to apply the data compression before transmitting data and selection of data compression algorithm which will take less memory access for execution, to effectively reduce the amount of energy consumption.
C. Types of Compression
The key objective of compression is to reduce energy consumption in WSNs. As mentioned earlier the three operations which are mainly responsible for energy consumption in WSNs are sensing/sampling, computation , and communication .Thus if specify any technique which will directly or indirectly reduce one of these operations or more than one operations along with the application requirements can be considered as compression. In WSNs compression can be classified as follows:
Sampling Compression (SC): This process reduces the number of sensing/sampling operations. While reducing the sensing /sampling operations network coverage and distortion loss is kept within the acceptable range.
Data Compression (DC): It is a process of changing or converting an input data stream into a fewer bits data stream. According to it the repeated structure of data is identified and eliminated by applying the encoding for it.
Communication Compression (CC): In this, the number of packets transmissions and receptions are reduced i.e. it reduces the radio on-time of transreceiver in WSN.
D. Features of Data Compression
Lossless Vs. Lossy:
In lossless compression algorithm the recovery of original data is reconstructed by applying decompression on it, whereas in lossy the reconstructed data is not exactly like the original data. Lossy algorithm leads to loss of information, but ensures high compression ratios.
Data Aggregation:
For many applications only the summary of sensor data is required, but to reconstruct the original sample values summarized representation is inadequate. Thus aggregation needs in-network processing of sensor data but can reduce communication overhead at large extend.
Data Correlation:
Sensor nodes are organized in closely to each other, correlations between sensed values at different nodes is high, referred as spatial correlation. Since sensor events are observed in continuous manner, these successive discrete signal samples shows high correlation, called as temporal correlation. Data compression algorithms proposed for such correlations helps in improving the compression ratio
Symmetric Vs. Asymmetric:
In symmetric algorithms, the computational complexity of compressing and decompressing the data is same. Whereas, in case of asymmetric, complexity of compressing and decompressing data is different. In WSNs the low complexity is preferred at compression at nodes and high complexity for decompression performed at sink.
Adaptive Vs. Non-adaptive:
In case of non-adaptive compression, the compression operations and parameters are fixed and it is applicable for stationary data in which the statistics of data do not change with time. In adaptive or dynamic compression methods involves the monitoring of raw data statistics and modification of operation/parameter to improve performance, this is suitable for non-stationary data but is more complex


A. Text-based Compression
Many compression algorithms are designed for text data which support lossless lossless compression there is no loss of information and data transmitted is acquired in the same form as that of the original. One of the algorithm which support dictionary-based lossless compression is S-LZW (Sensor-Lempel Ziv Welch).It is designed by taking into account the popular compression algorithm Lempel-Ziv-Welch (LZW) algorithm. This algorithm construct a dictionary for encoding new string on the basis of previously encountered string dynamically [5]. In Lempel-Ziv-Welch algorithm, compression of data is done by finding the common substrings then fewer bits are used to represent them. Uncompressed input stream gets splits into fixed-size blocks and then compressed separately. For each new block, dictionary is used is re-initialized with the help of some codes to represent standard character set. For S-LZW two sets of up to 256 eight –bit symbols are maintained i.e. original ASCII characters and set containing common strings. To recognize the set formed a bit is appended at the beginning of each encoded symbol and to tracking of string representing the eight-bit sequence a dictionary is maintained.Mini cache dictionary is used to store the recent strings with the maximum size of 2N entries where N<8.To recognize which string is from main dictionary or mini-cache, an additional bit is appended at the beginning of each symbol. For different optimal values for N, different data sets are created [6].
As S-LZW is most appropriate for sensor nodes, but it does not take into account advantage of sensor data characteristics i.e. spatial and temporal correlations existing in sensed data. Sensor data is considered as repetitive over sort intervals in nature. Due to the use of high sampling rates designed to allow accurate capture of sudden changes, the sensor data tends to be repetitive over consecutive samples. For these situations it is optimized by using Mini-cache(SLZW- MC).The most important decision to be taken into account while designing is the size of mini-cache. S-LZW and its variants are considered as good compression algorithm for WSNs which helps in achieving little or zero spatial and temporal data correlations [7].
B. Data Aggregation
Data aggregation [8] [9] is the simplest in-networking processing technique for data and communication compression in WSNs.By using data aggregation techniques, we need to select a subset of sensor nodes to collect and fuse the sensing data sent from their neighbouring nodes and then the small sized aggregated data is transmitted to the sink. The data aggregation technique is unrecoverable because the original data cannot be derived from aggregated data.
In data aggregation the data transmissions in WSN is reduced by fusing (or aggregating) the sensed data. The data aggregation techniques generally simple but aims at selecting the aggregation nodes which efficiently reduce the overall data transmissions in WSN [8].
Depending on the network structures, data aggregation schemes can be classified as follows:
a) Tree-based data aggregation schemes - It organize the network into a tree structure for data collection and aggregation
b) Cluster-based data aggregation schemes – clusters are formed by grouping sensor nodes and then each cluster head aggregates the sensing data within its cluster.
c) Chain-based data aggregation schemes – sensing data is transmitted by each sensor node to its nearest neighbour node and chain structure is formed by network to aggregate the sensing data in the network.
d) Sector-based data aggregation schemes – it uses a ring-sector division concept to cluster sensor nodes so that the sensor nodes in the same sector will be assembled into a cluster.

Tree-based data aggregation schemes:

By jointly optimizing data aggregation and routing tree formation [10] to maximize the network lifetime is the objective of tree-based data aggregation schemes. These schemes organize the sensor nodes into a tree structure, where the data aggregation operation is conducted at the intermediate nodes along the tree and the aggregated data of the whole network will be eventually sent to the root node (i.e. sink).
Here Fig. 3 depicts the tree-based data aggregation schemes, where each sensor node ‘i’ will generates its sensing data ‘si’ and each aggregation node ‘j’ will fuse the data sent from the child nodes with its own sensing data ‘sj’ by an aggregation function.

Cluster-based data aggregation schemes:

In cluster-based data aggregation schemes, we need to first group sensor nodes into clusters and then cluster head is selected for each cluster to aggregate the sensing data sent by other by other sensor nodes in that cluster. Aggregated data is transmitted by cluster head to the sink directly through long-range transmissions or indirectly by multi-hop communications through other cluster heads.

Low-Energy Adaptive Clustering Hierarchy (LEACH)[32]:

LEACH is a distributed cluster-based data aggregation scheme. It comprises of setup phase for organizing the network into clusters and selecting the cluster head, and steady-state phase for aggregating these cluster heads.

Hybrid Energy-Efficient Distributed Clustering (HEED)[33]:

This scheme concentrates on maximizing the network lifetime by constructing the efficient clusters. Here it is assumed that sensor nodes can adjust their transmission powers and selecting the cluster head by considering the combination of residual energy of sensor nodes and their link degrees.

Chain-based data aggregation schemes:

Here sensing data of each sensor node is transmitted to the nearest neighbour, thus network forms a long chain which connects the all sensor nodes, excluding the nodes at the end of chain and remaining all sensor nodes associated with the chain will become the aggregated nodes.

Power-Efficient Data-Gathering Protocol for Sensor Information Systems (PEGASIS)[30]:

In this data aggregation is done by organizing the sensor nodes into a linear chain. A chain can be formed by adopting a greedy algorithm in which it is assumed that all sensor nodes should have the global knowledge of the network

Sector-based data aggregation scheme:

The Semantic/Spatial Correlation-Aware tree (SCT) scheme is a sector-based data aggregation scheme which considers a circular WSN centred at the sink and with radius R as shown in Figure 6 [31].
The network is divided into ‘m’ concentric rings with each ring has same width of Rm for efficiently collecting and aggregating sensing data. Further ring is divided into sectors of same size in a manner that each sector contains the same number of sensor nodes. An aggregation node is selected for each sector to collect and aggregate the sensing data sent by other sensor nodes in the sector. By connecting each aggregation node in the ith ring to its upstream aggregation node in the (i-1)th ring following the shortest path, an aggregation tree is constructed and aggregated data is transmitted to the sink by aggregation node.

C. Predictive Coding

Compressing the data containing numerical values such as images using context-based approaches is quite problematic and thus predictive coding can be applied for such data [11].In predictive coding, future observations at the sink based on statistical model and recent measurements are predicted by using inherent temporal correlation between consecutive readings at an individual sensor [7].
Predictive coding can use parametric modelling or non-parametric modelling depending on the nature of sensor data. In case of parametric modelling it is important to have knowledge of the statistical or numerical parameters, such as mean, variance of sensor data. Whereas for non-parametric modelling very little prior knowledge about the sensor data is required and it uses regression to represent sensor data. Most of the existing predictive coding schemes [12][13][14][15][16] use parametric modelling in which the predictive model is generated for every sensor node during the training phase and the parameters of model are passed to sink. Whenever new data arrives or the difference between the model predicted value and sensed value exceeds a threshold, then only nodes transmit updates to the sink. By this way it provides communication-level compression reducing the number of communications between source nodes and sink. In case of image compression, there exists a strong correlations between pixels in neighbourhood. For this predictive coding is highly effective. The standard lossless image compression JPEG-LS based CALIC [17] uses the predictive coding for its implementation.

D. Distributed Coding

Distributed source coding (DSC) is a popular approach for data compression in WSNs. It follows the well-known theorem Slepian and Wolf which proved that for lossless compression separate encoding is as efficient as joint encoding [19].
Rather using the actual value of the sensor data, optimal centralized compression efficiency can be achieved by compressing each sensor’s data in a distributed manner only using statistical knowledge of the data. [7]
Gathering and tracking of correlation knowledge and code construction are two main operations involved in practical DSC schemes [19]. Correlation gathering and tracking can be done in a centralized or distributed manner. In case of centralized correlation gathering and tracking an individual node such as sink is responsible for collecting and tracking all the correlations within the network. While in case of distributed case, cluster heads performs the function of gathering and tracking correlation data for a subset of nodes and a summary is shared with the sink [20].In four different ways the encoding can be done: No-Slepian-Wolf Scheme (NOSW), Sequential Slepian-Wolf scheme (SEQ), Slepian-Wolf Clustered (CL), and Slepian-Wolf Master Slave (MS) [21]

E. Transform-based Coding

Transform-based compression approaches supports lossy compression and are common for image and video signals. To transform raw data into a set of coefficient of appropriate basis functions (for example wavelet function) can be used to reconstruct the signal at the receiver. In many cases to recover an approximation of original data with low distortion, a reduced number of quantized and non-zero coefficients are sufficient to use. Entropy coding can be further applied to coefficients to reduce data rate. The Discrete Cosine Transform (DCT) (e.g. DCT is used in JPEG) and Discrete Wavelet Transform (DWT) (e.g. DWT is used in JPEG2000) used widely in image and video compression applications.
Transform-driven approaches [22][23] emphasis on the deployment of specific transform. The routing and processing strategies are developed to allow computation of the transform in the network.

Karhunen-Loeve Transform (KTL):

Karhunen-Loeve Transform (KTL) [24] is main constituent of many signal-processing and communication systems and is commonly used for compression.
Discrete KLT (DKLT) is data compression scheme based on KLT reduces the data to be transmitted over WSNs and can be used in distributed manner [24]. Using DKLT requires the knowledge of global correlation statistics. It is nonunidirectional i.e. data sometimes may travel away from sink is considered to be expensive in terms of communication cost. As for practical WSNs applications a direct implementation of KLT is not suitable. And thus to overcome this problem a unidirectional tree-based KLT (T-KLT) has been proposed [25]. In this method the KLT is applied to the data collected at each node and its descendants. This fades or de-correlates the data. Then the coefficients of the transform are encoded and forwarded to parent nodes. At parent nodes, the inverse KLT is applied to recover the original data. To implement T-KLT each node must know the second-order statistics of its sub tree and thus this incurs learning costs related to these statistics. As the distributed KLT approach works at sink node, it is not power constraint and thus is potential in saving power consumption. It is used application which have real-time operation because it requires low processing time.

Distributed Wavelet Transform-based Lifting (DWT-lifting) [26] [27]:

This schemes assume that the distance between a sensor node and neighbour are shorter than the distance between the sensor node and the sink node. It is important to take this assumption into account because to compute coefficients sensor nodes needs to communicate with each other. But the communication between the sensor node and the sink node is lower.
The work of [26] suggests that three components of power consumption: computational cost (algorithm performance), communication cost between sensor nodes and communication cost between sensor node and sink node. The computation of wavelet coefficient using the lifting scheme takes two multiplications and four additions. This computational cost is very low as compared to other components of power consumption. And thus this scheme achieve a complete power saving. There exist a signal-noise ratio (SNR) and degree of data compression ratio depends on it. Also the degree of power saving depends on it.

Distributed Wavelet Transform-based Harr (DWT-Harr) [28]:

This scheme is same as the DWT-lifting [26] [27].In this, the degree of compression ratio and power saving depends on the SNR exiting in network similar to DWT-lifting. The only difference exist between DWT-lifting and DWT-Harr is that it gives better performance than DWT-lifting.
F. Compressed Sensing
In case of DSC prior knowledge of the precise correlations in data must be known, thus it is a disadvantage associated with DSC. And thus its performance depends on specific assumptions. Whereas in case of compressed sensing based compression techniques in WSNs do not rely on any specific prior knowledge or assumption on signals [29].
Compressed sensing comes into existence because of the three inherent in efficiencies of transform coding: First, compressing high-dimensional signal means processing a large number of samples n. Second, the encoder must compute all transform coefficients. Finally the encoder must encode the indices of large coefficients. All these increases the coding rate and thus the indices change with each signal.


Compression algorithms are developed for providing the improvement in utilization of limited resources of WSNs. As specified in comparative study in discussion, all proposed compression techniques are analyzed based on their characteristics and are found providing better compression ratios and thus energy savings. Aggregation is most easily applicable and deployable technique and there exists many variants for it. But it is found that it is unable to produce original sensor data at sink and thus has limited applications. Predictive coding is very helpful in reducing the data communication but needs to have a knowledge of statistics of data and thus in case of dynamic environment it can be highly expensive and complex. DSC techniques provides high compression ratios but again it requires the correlation knowledge of data, it is quite expensive in that terms. Considering the scenario of dynamic environments transformbased approaches and compressed sensing are effective. The scope of these techniques can be explored to achieve the greater performances mainly focusing on the QoS, scalability, reliability and security issues of WSNs. Thus it can give many opportunities in exploring the new compression techniques taking into account these issues to work effectively towards the energy saving in WSN which in turn prolongs the lifetime of network.

Tables at a glance

Table icon
Table 1

Figures at a glance

Figure 1 Figure 2 Figure 3 Figure 4
Figure 1 Figure 2 Figure 3 Figure 4
Figure 1 Figure 2 Figure 3
Figure 5 Figure 6 Figure 7