AN EFFICIENT FRAMEWORK FOR NETWORK BASED QUERY PROCESSING IN ROAD NETWORKS

Akula S R Lingeswara Rao; K. Rajani Devi

AN EFFICIENT FRAMEWORK FOR NETWORK BASED QUERY PROCESSING IN ROAD NETWORKS

Akula S R Lingeswara Rao¹ and K. Rajani Devi ²

Student, M.Tech (IT), Siddharth Nagar, Nalanda Institute of Engineering and Technology, A.P, India.
Head of the Department, (IT), Siddharth Nagar, Nalanda Institute of Engineering and Technology, A.P, India.

Corresponding Author: Akula S R Lingeswara Rao, E-mail: akula.sairam@yahoo.com

Related article at Pubmed, Scholar Google

Visit for more related articles at Journal of Global Research in Computer Sciences

Abstract

Advances in sensing and tracking technology enable location-based applications but they also create significant privacy risks. Anonymity can provide a high degree of privacy, save service users from dealing with service providers’ privacy policies, and reduce the service providers’ requirements for safeguarding private information. However, guaranteeing anonymous usage of location- based services requires that the precise location information transmitted by a user cannot be easily used to re-identify the subject. This paper presents middleware architecture and algorithms that can be used by a centralized location broker service. The adaptive algorithms adjust the resolution of location information along spatial or temporal dimensions to meet specified anonymity constraints based on the entities that may be using location services within a given area. Using a model based on automotive traffic counts and cartographic material, we estimate the realistically expected spatial resolution for different anonymity constraints. The median resolution generated by our algorithms is 125 meters. Thus, anonymous location-based requests for urban areas would have the same accuracy currently needed for E-911 services; this would provide sufficient resolution for way finding, automated bus routing services and similar location- dependent services.

Keywords

Anonymization, Query Processing.

INTRODUCTION

The low cost and small size of positioning equipment (e.g., GPS receivers) have allowed their embedding into PDAsand mobile phones. The wide availability of these location-aware portable devices has given rise to a flourishingindustry of location-based services (LBS). An LBS makesspatial data available to the users through one or more location servers (LS) that index and answer user queries on them. Examples of spatial queries could be “Where is the closest hospital to my current location?” or “Which pharmacies are open within a 1 km radius?” In order for the LS to be able to answer such questions, it needs to know the position of the querying user. When a user u wishes to pose a query, she sends her location to a trusted server, the anonymizer (AZ)[1], through a secure connection (e.g., SSL). The latter obfuscates her location, replacing it with an anonymizing spatial region (ASR) that encloses u. The ASR is then forwarded to the LS. Ignoring where exactly u is, the LS retrieves (and reports to the AZ) a candidate set (CS) that is guaranteed to contain the query results for any possible user location inside the ASR. The AZ receives the CS and reports to u the subset of candidates that corresponds to her original query. In order for the AZ to produce valid ASRs, the users send location updates whenever they move (through their secure connection). The described model is shown in Figure 1.

RELATEDWORK

Section 2.1 reviews related work on road network databasesand Section 2.2 surveys the literature on spatial anonymity.

Spatial Query processing in road networks:

In general, a road network can be modeled as a weightedgraph G = (N;E). N contains the network nodes, while E is the set of edges. Nodes n in N model road intersections, locations of road turns, or positions where traffic conditions change (e.g., a street gets narrower). On the other hand, every edge e connects two nodes and is associated with a non-negative weight w(e). Weight w(e) may represent, for instance, the traveling time from one node to the other. Figure 2 shows an example of a road network. Edge n1n2 has weight 3, and its endpoints are nodes n1 and n2. Let p be a point on an edge e with weight w(e). The partial weight from p to an end-node of e is proportional to their (Euclidean) distance, while the sum of the two partial weights is equal to w (e). For instance, object o1 (shown as a solid point) lies on edge n3n4 and has partial weights 1 and 3 from nodes n3 and n4, respectively. Similarly, user u (the hollow point) falls on edge n2n3 and both of its partial weights are 2.

The network distance dN (u; o) between a user u and an object o is defined as the sum of edge weights along the shortest path (in the network) from u too. In our example, the network distance dN (u; o1) between user u and object o1 equals to 2+1=3. Its derivation is strongly related to shortest path computation. In case of a small network, main memory shortest path algorithms (e.g., Dijkstra’s algorithm) can be applied to compute dN(u; o). Otherwise, disk-based data structures are utilized. Query Processing by Network Expansion. Users are often interested in location- based queries such as r-range and Knn queries [7], in the context of a road network. Given a distance threshold r and a user location u, the r-range query returns all objects within (network) distance r from u. On the other hand, the kNN query retrieves the k objects that are closest to u. In the rest of the paper, the term distance refers to the network distance, and the r-range and kNN queries refer to their network versions (unless otherwise specified). Developed efficient indexing and processing methods for the above queries. They proposed the following disk-based structures for indexing the road network and the data objects: (i) the adjacency index packs adjacency lists of network nodes into disk blocks, (ii) the edge R-treespatially indexes the network edges, and (iii) the object R- tree (ORT) organizes the locations of the data objects Network expansion is a well-known technique for evaluating r-range and kNN queries. Starting from the user location u, it discovers objects on encountered edges while traversing the network like Dijkstra’s algorithm, until the query results (i.e., data objects of interest) are found. Suppose that, in Figure 2, user u issues a range query with r= 9. First, we access the adjacency index to identify edges within the query range, following the steps in Table 1.

A min-heap H is employed for organizing entries of the form(ni; dN(u; ni))(for encountered nodes ni) in ascending orderof distance dN(u; ni). In our example, the edge n2n3 containing u is initially identified, and its end-nodes n2 and n3 (both with distance 2) are inserted into H. In each iteration, the node ni with the minimum distance is de- heaped from H, its incident edges ninj are recorded, and its adjacent unvisited nodes nj (having dN(u; nj) within the range) are inserted into H.The first three steps of Table 1, edges n2n3, n2n1, and n3n4 fall completely within the query range. However, at step 4 the de-heaped node n1 has distance 5 from u and only the partial edge n1n5 (4) lies within the range r = 9. The process continues until H becomes empty. Having discovered the relevant edges, we probe the ORT to retrieve the result objects, o1 and o2.

NCOG Location – Based Queries:

Recently, considerable research interest has focused onpreventing identity inference in location-based services. Studies in this area typically assume the model described in Section 1, proposing spatial cloaking (i.e., location obfuscation) techniques. In the following, we describe existing techniques for ASR computation (at the AZ) and query processing (at the LS) [8].At the end, we cover alternative location privacy approaches and discuss why they are inappropriate to our problem setting.Spatial cloaking at the AZ. In general, the AZ applies the concept of K-anonymity to hide the querying user location u. The idea is to compute an anonymizing spatial region (ASR),containing u and at least K - 1 other user locations. Thisoffers privacy protection in the sense that the actual user position u cannot be distinguished from others in the ASR, even when malicious LS is equipped/advanced enough to possess all user locations. This spatial K-anonymity model is most widely used in location privacy research/applications, even though alternative models are emerging. Casper [6] is the first work on efficient and scalable AZ implementation for ASR computation. A quad-tree is utilized for indexing user locations and deriving ASRs. Suppose that the AZ needs to compute a 2-anonymous region (i.e., K=2) for querying user u1 in Figure 3(a). The AZ first locates the leaf quad that contains u1 and traverses the tree upwards until it identifies a region covering at least K users (including u1).In this case, the AZ derivesrectangleR1;2;3 (containing three users)as the 2- anonymous region of u1.

OBFUSCATION

Obfuscation is the process of degrading the quality of information about a person’s location with the aim ofprotecting that person’s location privacy. The term“obfuscation” is introduced, but several closely related concepts have been proposed in previous work. The “need- to-know principle” aims to ensure that individuals release only enough information that a service provider needs to know inorder to provide the required service. The idea of a need-to- know principle is closely related both to obfuscation and the fundamental fair information practice principle of consent and use. Sleekness investigates a privacy policy- based approach to enforcing the need-to-know principle in location aware computing by adjusting precision of location informationthe work in develops and tests an algorithmic approach to obfuscating proximity queries based on imprecision.

A simplified version of the algorithm introduced in issummarized in.The algorithm assumes a graph-basedrepresentation of a geographic environment (for example, a road network). An individual protects his or her location privacy by only reporting a set O of locations (an obfuscation set), one of which is that individual’s actual location (figure 3.1a). For an obfuscation set O, the location-based service provider must compute the relation d (figure 3.1b), where od p means o, p < O are most proximal to the same point of interest (POI). The algorithm then proceeds according to three possibilities. First, all the locations in the obfuscation set may be most proximal to a single POI (O < O/d ), in which case that POI can be returned to the Second, the individual may agree to reveal a more precise representation of his or her location, in which case the algorithm can reiterate. Otherwise, the best estimate of the most proximal POI us returned .

The analysis in shows that efficient mechanisms for computing the relation d can ensure that the entire algorithm has the same computational (time) complexity as a conventional algorithm for proximity queries, and that the algorithm must terminate in a finite number of iterations. Obfuscation has severalimportant advantages that complement the other privacy protection strategies. Obfuscation and anonymity are similar, in that both strategies attempt to hide data in order to protect privacy.

The crucial difference betweenobfuscationandanonymityisth atwhileanonymityaimto hide a person’s identity, obfuscation is an explicitly spatial approach to location privacy that aims to allow a person’s identity to be revealed. Potentially, this combats one of the key limitations of anonymityapproaches: the needtoauthenticateusers. At the same time, degrading the quality of locationin format ionmakinferring identity from locationmore difficult.Obfuscation is flexible enough to be tailored to specific user requirements and contexts, unlike regulatory strategies; doesnot requirehighlevels of complexinfrastructure and islessvulnerable to inadvertentdisclosure of personal information,unlikeprivacy policies; andislight weight enoughto be used without the needfortrustedprivacy brokers, unlikemanyanonymity approaches. Obfuscationaimstoachieveabalancebetweenthelevelofprivacyof personal information andthe quality of service of a location-based service. Current research has indicated that thereexistmanysituations whereitis possibletoexpect high qualitylocation-based services basedonlow quality positionalinformation. Consequently, insituations where the userrequiresa higherquality ofservicethancanbe achievedata user’sminimum acceptablelevelofprivacy, then other privacy protection strategiesmustberelied upon instead.Further, obfuscation assumesthattheindividualis abletochoose what information about his or herlocationto revealtoaservice provider. Whilethismayberealistic when usingclient-basedor network-assistedpositioning systems and when sharing location in formationwitha third partylocation –based service provider, dealing with the entities that administer network-based positioning systems stillrequires privacyprotection basedonregulatory or privacy policy[2][3] approaches.

NETWORK BASED ANONYMIZATION

In this section, we present the cloaking algorithm [4] of ourNAP framework. Our primary objective is to guarantee reciprocity based anonymity. In NAP, the AZ Anonymizes u with a set of line segments/edges instead of a spatial region (ASR). The crux of our cloaking method is to utilize a global edge ordering; i.e., an ordered sequence that contains all network edges exactly once. The edge ordering is setting-sensitive, i.e., it specifies which end-node of the edge precedes the other. We refer to the position and setting of an edge in the ordering as the edge order and the edge setting, respectively. To avoid confusion, the setting of an edge depends solely on the ordered sequence, and has nothing to do with the direction (in the case of directed networks) of the road segment it models. Figure 5(a) showsaroadnetwork,and anorderingofitsedges. Thenumber next to each edge indicates its order and the arrow its setting.Theedge orderingdefinesanimplicitlinearorder among the usersthemselves. In particular, a user uprecedes another u0iftheedgeof uhassmallerorderthanthat ofu0.

Iftheyfallonthesameedgeninj(withsettingfromnitonj),upreced es u0ifitisclosertoni.Tiesamongcoinciding users are resolvedarbitrarily. This residencerelationship definesthe order ofeach useru.The position ofa user inthe defined sequenceis referredtoasthe user order. The examplein Figure5 (a) contains 10 users whose subscript indicatestheirorder (i.e.,useru3 hasorder 3, etc.). Reciprocity in NAP is achieved by conceptually partitioningtheuser orderinginto buckets of K userseach, and forwardingtothe LSthe edges correspondingtothe bucketofthe querying user u. Thissetofedgesiscalled the Anonymizing Edge List (AEL) ofu. Specifically, let Ube theset of user sregistered with the AZandassumethata queryinguser requiresanonymity of degree K. Set Uis partitionedinto B =bjUj=Kcbuckets, eachcontaining K users, except the last one which may containup to 2_K1; thei-thbucketbi (fori<B) consist so fusers with order from (i1) _K + 1 toi _ K, andthe B-this assigned the remainingones. Consider the network in Figure5 (a), where jUj=10, and assumethatu6 posesaquery with anonymity requirement K=3. Thisresultsi nto 3buckets, b1= fu1; u2; u3g, b2= fu4; u5; u6g,and b3= fu7; u8; u9; u10g. User u6 belong stob2 and isanonymizedtogether with the other user sinit. The boundary users (i.e.,first and last) of b2 are u4 and u6, whose edges have orders 5and7. The AEL is formedby collecting alledges with orders between 5and 7; i.e., it comprisesedges n6n8, n8n9, and n9n2, shown bold. We follow the above global edge/user ordering approach, because a local one could lead to privacy breachina waysimilarto Figure 3 (a) for Casper. Specifically, uponinterceptiono fan AEL generated by ourmethod, one cannot inferwho amongthe usersinthe correspondingbucket the querying one was. In other words, the AEL for any user in the same bucket is identical, and there for eanadversary cannot pinpo in the queryoriginator with aprobability higher than 1=K (recall that each bucket contains K or more users). Hence, our cloaking method satisfies reciprocity. Reciprocity, in turn, is a sufficient condition for anonymity and,thus, NAP guarantees K-anonymityto the querying users. In the rest of thissection, wedescribe edge or deringstrategies. Then, wepresent particular techniques for the anony mizationprocedure. Finally, we analyze the properties of the proposed edge orderings.

ANONYMIZATION PROCEDURE

Givenanedgeordering,thenext questionis howAEL computation can be implemented efficiently at the AZ. Parameter Kisnot known inadvance andvaries, since different users have different anonymity requirements, and evenqueries by the same user may specifydifferent K, dependingonthenature ofthequeried data. Asbucketsare defined according to K, they cannot be explicitly materialized.Instead,theAZ employsan index that keeps the users sort edontheirorder and allows efficient AEL computation forarbitrary K. Theindexisanaggregate B- tree (similartoanaggregate R-tree), whoseinternal nodes keep for each child the number of users in the correspondingsub-tree. Figure 6 showsthi stree in the exampleof Figure 5 (a). Foreachuser (e.g.,u6) westore the Idoftheedgeitfallson (n9n2), the edge’sorder (7), and itsdistancefrom theedge’s firstend-node (jn9u6j). The latter two valuesare used (primarily theedge orderand secondarilythedistancefrom thefirstend-node) as the sorting keyof the tree.

InFigure 6the numbersintheshaded boxescorrespondto the aggregate informationmaintained, i.e., the cardinalities ofthesub-treesrootedthereof.Notethat we useaB-tree instead ofa B+-tree (i.e., user informationisalsostored in internal nodes),becauseit is fasterfor in-memory indexing.

Anonymous Query Processing:

InthissectionwedescribeAELqueryprocessingattheLS; wepresentalgorithmsforminimalandinclusive CS computation for asinglequery, we proposeadditional optimizations for the case where multiple AEL queriesare processedina batch. Wedemonstratethe generality of NAP with respect to the network storage scheme usedat the LS. Single Query Processing: Processingis basedona directimplementation of the theorem uses (network-based) searchoperationsas off the- shelf buildingblocks. Thus, the NAP queryevaluation methodology is readily deployableonexisting systems, and canbeeasilyadaptedtodifferent network storag eschemes, aswediscuss in Section 5.3. Asacasestudy, inthissection wefocus on the storage scheme and the network expansion framework, in order to provideaconcrete NAPprototype.

Batch Query Processing:

The LS processes queriesindiscrete time stamps, and multiple AEL-based queries may bearriving in the same timestamp. In this case,the queries are evaluatedina batch.

Below we propose strategies aimingat maximizing computation sharing among different queries.

EXPERIMENTAL EVALUATION

We evaluatether obustnessand scalability of our proposed methods onareal road network [5]. Our algorithms were implementedin C++ and experiment swereexecutedona Pentium D2. 8GHzPC. We measured the average of the following performance values overa query work load of 100 queries: (i) anony mizationtime and refinementtime at the anonymizer AZ, (ii) I/O time and CPU time for query processing at the location server LS, and (iii) the communication cost (intermsoftransmittedpoints) for the anonymizingedge list AEL and the candidate set CS.

ScalabilityExperiments:

Inthissection, weinvestigate the scalability of NAP with respect to various factors. To provideanindication of the spacerequirements, wenote that for the largest tested data sizes (i.e., j Uj=200000 and jOj=1024000), the AZ uses only 12.5 M Bytes of main memory (including the network graph) and the LS needsa total of 23.5 M Byte shard disk storage. End-to-endtime. Befor ealower level study, we present anexperiment on the overall responselatency. Specifically, from theuser’s viewpoint, theend-to-end time captures the elapsed time between issuing a query and obtaining the results. It includes the processing timeat AZ, the computationtimeat LS, and the communication time between AZ and LS.

Figure 7 showstheend-to-endtimeasa function of the anonymity degree K, assumingacommunicationb and width of 10 Mbps. Clearly, the processing cost at LS dominates theend-to-end time, while the communication (between AZ and LS) and the AZ computations account only fora small percentage of the totaltime. It is worth mentioning that the processing (including anonymization and refinement)at AZ takes 0.000620 seconds for RE. HN and DF have similar costs.ThisimpliesthattheAZiscapable ofserving 1600 requestsper second.

CONCLUSION

Inthis paper, wepropose the network-based anonymization andprocessing (NAP) framework, the firstsystemfor K - anonymous queryprocessinginroad networks. NA Preliesona global user ordering and bucketization that satisfies reciprocity and guarantees K-anonymity. Weidentify the ordering characteristics that affect subsequent processing, and qualitativelycompar ealternatives. Then, we propose queryevaluation techniques that exploitthese characteristics. Inadditionto userprivacy, NAP achieves lowcomputational and communication costs, and quick responsesoverall. Itisreadily deployable, requiringonly basic network operations. In the traditional spatial anonymitymodel, thedata owner (e.g., alocation-based service) makes its data available usinga location server. It may, however, be the case that the owner is out sourcing its data base to a third-party (and, thus, untrusted) location server. A challengehere is how to encrypt the owner’s data so that they are hidden from the location server, while it can still process anonymous queries. Another interesting questionis how (anonymous) users could verify that the location server did not tamper with the original owner data.

References

http://www.anonymizer.com/.
N. Mishra, R. Motwani, U. Srivastava, D. Thomas, J. Widom, and Y.Xu. Vision Paper: Enabling Privacy for the Paranoids. In VLDB,2004.
R.Agrawal, J. Kiernan, R.Srikant, and Y. Xu. Hippocratic Databases. In VLDB, 2002.
A.R. Beresford. Location privacy in ubiquitous computing. PhDthesis,Computer Laboratory, University of Cambridge, 2005.
T.Brinkhoff.AFramework for Generating Network-based MovingObjects. GeoInformatics, 6(2):153–180, 2002.
R. Butz. Alternative Algorithm for Hilbert’s Space-Filling Curve. IEEE Trans. Comput.,C-20 (4):424–426, 1971.
C.-Y. Chow and M. F. Mokbel. Enabling private continuous queries for revealed user locations. In SSTD, 2007.
C.-Y. Chow, M. F. Mokbel, and X. Liu. A peer-to-peer spatial cloaking algorithm for anonymous location-based service. In GIS,2006.