Towards an Efficient Top-K Trajectory Similarity Query Processing Algorithm forBig Trajectory Data on GPGPUsEleazar Leal, Le GruenwaldSchool of Computer ScienceUniversity of OklahomaNorman, OK 73019, USAEmail: {eleal,ggruenwald}@ou.eduJianting Zhang, Simin YouDept. of Computer ScienceCity College of New YorkNew York City, NY 10016, USAEmail: [email protected], [email protected]—Through the use of location-sensing devices, ithas been possible to collect very large datasets of trajectories. These datasets make it possible to issue spatio-temporalqueries with which users can gather information about thecharacteristics of the movements of objects, derive patternsfrom that information, and understand the objects themselves.Among such spatio-temporal queries that can be issued isthe top-K trajectory similarity query. This query finds manyapplications, such as bird migration analysis in ecology andtrajectory sharing in social networks. However, the large sizeof the trajectory query sets and databases poses significantcomputational challenges. In this work, we propose a parallelGPGPU algorithm Top-KaBT that is specifically designed toreduce the size of the candidate set generated while processingthese queries, and in doing so strives to address these computational challenges. The experiments show that the state ofthe art top-K trajectory similarity query processing algorithmon GPGPUs, TKSimGPU, achieves a 6.44X speedup in queryprocessing time when combined with our algorithm and a 13Xspeedup over a GPGPU algorithm that uses exhaustive search.Keywords-Trajectory; Trajectory similarity; GPGPU; HighperformanceI. I NTRODUCTIONA trajectory is a polygonal line consisting of the pointsthat a moving object occupies in space as time goes by. Oneway of constructing these polygonal lines is by periodicallysampling the positions of the objects being tracked throughthe use of location sensors like GPS. Given the ubiquity oflocation sensors, it has been possible to collect large trajectory datasets like the Microsoft Geolife dataset [1], whichconsist of over 17,621 trajectories and over 23,667,828points. These datasets make it possible to obtain informationabout the past, present and future states of the movementsof the objects being tracked, and to extract patterns fromthese movements. Among the different types of queries thatcan be posed against these trajectory datasets is the topK trajectory similarity query. A top-K trajectory similarityquery receives as inputs a positive integer K 0, and twosets of trajectories, P (the query set) and Q (the database),and returns for every p P a set of K trajectories inQ that are most similar to p. An example of a top-Ktrajectory similarity query is “given the trajectories of mylast world trips, find 10 friends with the most similar traveltrajectories.”Top-K trajectory similarity queries have many applications. For example, in ecology, scientists are interested inunderstanding how diseases are transmitted between birds,and how bird species make use of space [2]. These queriesalso have applications in online social networking sites[1] that allow sharing of travel trajectories. For example,an individual might want to meet other people with themost similar travel trajectories to his own trajectories. Theseapplications involve big trajectory data where data are longtrajectories with many locations, and the number of trajectories is large due to the high number of moving objects. Inthis paper, we refer to these applications as Big TrajectoryData applications.Processing top-K trajectory similarity queries poses significant computational challenges stemming from three factors: the massive size of the datasets that are of interest forthe applications, the internal complexity of a trajectory, andthe computational complexity of the similarity measure. Onestrategy that can be used to tackle these computational challenges consists in exploiting parallel computer architectures,such as GPGPUs.Among the many advantages of GPGPUs are that they arepresent in many kinds of computers, from mobile devices tosupercomputers; on certain algorithms that exhibit lots ofparallelism they can achieve up to an order of magnitude ofhigher floating point instruction throughput than multicoreCPUs [3]; and they are very energy efficient. Anotheradvantage of GPGPUs is that there are works [4] that allowGPGPU processing from within the Spark parallel computing framework [5], so that the high instruction throughputof GPGPUs can be combined with the scalability, ease ofuse and fault-tolerance of the popular Spark framework.All these advantages of GPGPUs make them adequatetools for tackling the computational challenges associatedwith processing top-K trajectory similarity queries. GPGPUshave, however, a relatively small memory space, which canbe a limitation when processing Big Trajectory Data.While there are many serial algorithms that deal with

top-K trajectory similarity queries (e.g. [6], [7] [8], and[9]), very little research has been done in this area forGPGPUs. Recently, an algorithm called TKSimGPU [10]was proposed to solve this problem. While TKSimGPU hasbeen shown to work well with small data sets, it does notscale for Big Trajectory Data applications. This is because itgenerates many spurious candidate trajectory pairs that maynot fit into the GPGPU’s small memory space.A key issue when processing top-K trajectory similarityqueries on Big Trajectory Data is to avoid unnecessary computations of the similarity measure on trajectory pairs (p, q).This is because most similarity measures have quadratic timecomplexity on the number of points of p and q, so it is a veryexpensive operation when the numbers of the trajectories inthe query set (P ) and in the database (Q) are very large, as itis the case in Big Trajectory Data applications. Additionally,top-K trajectory similarity queries have result sets that havea fixed size K P P Q , so perfoming an exhaustivesearch to answer this query requires many unnecessarycalculations of the similarity measure on spurious pairs.Therefore, for scalably processing this type of query, it isdesirable to reduce the size of the candidate sets involved.In this paper we introduce Top-KaBT, a GPGPU techniqueto reduce the number of spurious candidate trajectory pairsgenerated by Top-K trajectory similarity query algorithmsfor Big Trajectory Data applications. This reduction isachieved by calculating the lower and upper bounds of theHausdorff distance between p and q for every candidate(p, q) pair, and then using the knowledge of these boundsto remove the candidates that for sure cannot form part ofthe query result set. Top-KaBT was also designed to exploitGPGPUs by ensuring load balancing across the threads, byensuring memory coalescing, and by using special pruningtechniques that reduce the size of the candidate set, thushelping to improve the time performance of the algorithm.We also present an experimental performance study comparing TKSimGPU when combined with Top-KaBT, to reducecandidate sets, against TKSimGPU and a naı̈ve algorithmon GPGPUs that requires an exhaustive search on the setP Q using a real trajectory dataset [1].The remainder of this paper is organized as follows.Section 2 presents related work. Section 3 formally introduces the concepts of trajectory and Hausdorff distance, andtop-K trajectory similarity queries. Section 4 contains thedescription of the Top-KaBT algorithm and its mathematicalfoundations. Section 5 describes the experiments used tostudy the performance improvements of Top-KaBT alongwith their results. Finally, Section 8 presents conclusionsand future research directions.previous works, our technique, Top-KaBT, reduces the sizeof the candidate sets generated by techniques that computetrajectory similarities using the Hausdorff distance, whichis a commonly used trajectory similarity measure [11].This distance has the advantage of being a metric. Thisproperty is shared with ERP and wDF, and allows the useof the triangular inequality as a means for reducing theamount of work. On the other hand, EDR and EDwP arenot metrics; so they need indexes specifically designed forthem. The Hausdorff distance also has the advantage thatit can be easily extended to mitigate the impact of noisymeasurements [11]. However, the pruning ideas behind TopKaBT are valid not only for the Hausdorff distance, butalso for any trajectory similarity measure that satisfies thetriangular inequality, like ERP and wDF. The work UTGrid[12] answers top-K trajectory similarity queries (KSQ) onuncertain trajectories with serial algorithms. However, allthese techniques are specifically designed for single-coremachines, and not for parallel architectures. In order toobtain a scalable parallel algorithm for big trajectory dataapplications, the algorithm must be designed to exploitthe underlying architecture by ensuring load balancing andadequate memory access patterns. This means that all thealgorithms discussed in this section would need substantialmodifications to run on parallel architectures.There are far fewer parallel techniques for trajectorysimilarity queries. U2STRA [13] uses the Hausdorff distanceas a similarity measure for near-join trajectory similarityqueries on urban trajectories, while the work in [14] uses theEuclidean distance for the same type of query when appliedto the study of moving galaxies. The near-join trajectorysimilarity query is different from the top-K trajectory similarity query because the former takes as input parametersa trajectory query set P , a database Q, and a real number 0, and seeks to obtain for every p P the set of allq Q whose similarity with p is at least [15].Another parallel technique for trajectory similarity queriesis TKSimGPU [10], which answers top-K trajectory similarity queries on GPGPUs using the Hausdorff distance. Adisadvantage of TKSimGPU is that, despite the fact thatit exploits the massive parallelism of GPGPUs and avoidsan exhaustive search, it may still potentially generate a largenumber of spurious candidates (as large as P Q). Such a setof candidates could be too big to fit in the GPGPU’s memory,making this technique not suitable for Big Trajectory Dataapplications. Our proposed technique, Top-KaBT, addressesthis issue with its pruning strategies that specifically reducethe size of the candidate sets.II. R ELATED W ORKIII. BACKGROUNDThe works ERP [6], EDR [7], wDF [8] and EDwP[9] propose new trajectory similarity metrics and serialalgorithms for trajectory similarity queries. Unlike theseIn this section we present the definitions of a trajectory,Hausdorff distance, and top-K trajectory similarity queries.

A. Definition of a TrajectoryGiven a set {(xi , yi , ti ) ti ti 1 , 1 i n} ofpoints in R3 sampled from the movement of an object with alocation sensor, a trajectory over S is a continuous functionτ : [1, n] R3 where τ (i) (xi , yi , ti ) for all integersi [1, . . . , n] and such that τ (x), with x [ti , ti 1 ), is theinterpolated value between τ (i) and τ (i 1) [16].B. Definition of Hausdorff DistanceGiven two finite sets of points P and Q, the Hausdorff distance hausd(P, Q) between P and Q is defined as the maximum between maxp P minq Q d(p, q) andmaxq Q minp P d(p, q) [11].C. Definition of Top-K Trajectory Similarity QueryGiven a positive integer K 0, two finite non-empty setsof trajectories P (the query set) and Q (the database), anda similarity measure σ : S S R, a top-K trajectorysimilarity query returns for every p P a set Rp satisfyingthat Rp K and for every q Rp and qother Q Rpit is the case that σ(qother , p) σ(q, p) [8].IV. T HE T OP -K A BT A LGORITHMA. OverviewTop-KaBT is a parallel GPGPU algorithm for reducingthe number of spurious candidate trajectory pairs (p, q)generated by top-K trajectory similarity query GPGPU algorithms that follow the filter-and-refine schema and use atrajectory similarity that satisfies the properties of a metric.To accomplish this, Top-KaBT calculates lower and upperbounds of the Hausdorff distance between p and q forevery candidate pair (p, q). These calculations are muchcheaper than the calculations of the Hausdorff distances, afact which will be proved in Section IV.B. After this, TopKaBT sorts the pairs according to their lower bounds ofthe Hausdorff distance, and uses these bounds to removespurious candidate pairs. By removing spurious candidatepairs, this technique lessens the negative impact of thesmall size of the GPGPU’s memory, and reduces the timewasted computing the similarity for these spurious pairs.Additionally, the technique addresses load balancing andmemory coalescing, by having threads within a thread blockperform the same amount of work, and by having threadswith consecutive indices access adjacent memory locations.B. Theoretical Foundations of Top-KaBT’s Pruning StrategyIn this section we present the definitions and theoremson which this pruning technique rests. The main result isTheorem 7, which states that if we have a trajectory p withnp candidate pairs Cp {(p, q0 ), (p, q1 ) . . . , (p, qnp 1 )}sorted by the lower bounds to their respective Hausdorffdistances, then if we find an integer 0 vK npmeeting certain conditions explained later, we’ll knowthat the K most similar trajectories to p will be among{(p, q0 ), (p, q1 ), . . . , (p, qvK )}, and we can prune the remaining elements {(p, qvK 1 ), (p, qvK 2 ), . . . , (p, qnp 1 )}.Notation: In the remainder of the paper, we denote byP the set of query trajectories, Q the trajectory database,and use mp,qi to refer to minx M BR(p),y M BR(qi ) d(x, y),where (p, qi ) P Q and d(x, y) is the Euclidean distancebetween points x and y. Similarly, we use the notation Mp,qito refer to maxx M BR(p),y M BR(qi ) d(x, y).Lemma 1. For any (p, q) P Q, it is the case thatmp,q hausd(p, q) Mp,q .Proof: Let a and b be points such that a M BR(P ), b M BR(Q), and hausd(p, q) d(a, b). By definition of mp,q we have that mp,q minx M BR(p),y M BR(q) d(x, y) d(a, b) hausd(p, q).The proof of hausd(p, q) Mp,q is analogous.Definition 2 (Cut point set). Given the candidate set Cp {(p, q0 ), (p, q1 ), · · · , (p, qnp 1 )}, with mp,qi mp,q(i 1) for0 i np 1, the cut-point set of Cp is defined as CPp {i Z Mp,qi mp,q(i 1) }. The elements of the cut-pointset are called cut-points.Example 3. If Cp {(p, q0 ), (p, q1 ), (p, q2 ), (p, q3 )} suchthat mp,q0 2.2, mp,q1 2.3, mp,q2 3.3, mp,q3 4.1,and Mp,q0 2.4, Mp,q1 2.7, Mp,q2 4.0, Mp,q3 4.2,then CPp {1, 2} because Mp,q1 2.7 3.3 mp,q2 ,and Mp,q2 4.0 4.1 mp,q3 .Definition 4. Given the candidate set Cp {(p, q0 ), (p, q1 ), · · · , (p, qnp 1 )}, with mp,qi mp,qi 1 for0 i np 1, with cut-point set CPp 6 , the min-cutpoint of Cp is defined to be min CPp .Definition 5 (min-K-Cut point). Given the candidate setCp {(p, q0 ), (p, q1 ), · · · , (p, qnp 1 )}, with mp,qi mp,qi 1 for 0 i np 1, with 1-cut-point set CPp 6 ,the min-K-cut point of Cp is defined to be the K-th smallestelement in CPp .Theorem 6. If v is a cut point of the candidate set Cp {(p, q0 ), (p, q1 ), · · · , (p, qnp 1 )}, with mp,qi mp,qi 1 for0 i np 1, then the 1-nearest neighbor to trajectory pis a qi with 0 i v.Proof: Assume that v is a cut point of Cp . Then,Mp,qv mp,qv 1 is true, and since mp,qv 1 is a lowerbound of hausd(p, qv 1 ), and Mp,qv 1 is an upper bound ofhausd(p, qv 1 ), then we have the inequality hausd(p, qv ) Mp,qv hausd(p, qv 1 ). By induction, we can easilyprove that hausd(p, qv ) hausd(p, qj ) for v j np .Therefore, the 1-nearest neighbor to p must be a qi with0 i v, which is what we wanted to prove.Theorem 7. If vK is a min-K-cut point of the candidateset Cp {(p, q0 ), (p, q2 ), · · · , (p, qnp 1 )}, with mp,qi mp,qi 1 for 0 i np 1, then the top-K nearest neighbors

of trajectory p lie among the qi with 0 i vK .Proof: We proceed by induction on K. The base casewith K 1 has already been proved in the previoustheorem. Assume k 1 and that the theorem holds forK k. Let’s verify that the theorem holds for K k 1.Let vk and vk 1 be the min-k-cut and the min-(k 1)cut points of Cp , respectively. By inductive hypothesis, weknow that the k nearest neighbors of p are contained inthe set {qi 1 i vk }. We also know that, by thedefinition of min-K cut point, vk vk 1 , and also thathausd(p, qk 1 ) hausd(p, qj ) for k 1 j np . Thisimplies that the k 1 nearest neighbors of p are in the set{qi 0 i vk 1 }, which is what we wanted to prove.Example 8. Continuing with Example 3 and using Theorem7, we know that the top-2 nearest neighbors of trajectory pare contained in the set Cp {(p, q0 ), (p, q1 ), (p, q2 )}. Thetheorem allows us to discard the candidate pair (p, q3 ).Observation 9. The minimum Euclidean distance betweentwo MBRs R with lower-left corner (rx , ry ) and upperleft corner (rx0 , ry0 ), and S with lower-left corner (sx , sy )and upper left corner (s0x , s0y ), can be computed in constant time complexityq using the mindist formula of [17]:mindist(R, S) d2x d2y , where di ri pi if pi ri ,di pi ri0 if ri0 pi , and di 0 otherwise, for i {x, y}.Similarly, the maximum Euclidean distanceqbetween R andS can be found using maxdist(R, S) c2x c2y , whereci ri0 pi if pi (ri ri0 )/2, and ci pi ri0 otherwise.Observation 10. Given a candidate set Cp {(p, q0 ), (p, q2 ), · · · , (p, qnp 1 )}, Observation 9 canbe used to efficiently compute mp,qi and Mp,qi becausethese two represent the minimum and maximum Euclideandistances between the MBRs of trajectories p and qi , forany (p, qi ) Cp .C. Description of Top-KaBT’s Pruning StrategyIn this subsection we describe our proposed parallelGPGPU technique to prune the candidate set of the top-Ktrajectory similarity query processing algorithm. The pseudocode algorithm for the pruning technique is in Algorithm1, while Fig. 1, 2 and 3 provide an illustrated example.The function S ORT P RUNING in Line 1 of Fig. 1 is incharge of further pruning the set of (p, q) candidate pairs,by removing pairs that cannot form part of the result set,as assured by Theorem 7. This function takes an integer Kand a list of (p, q) pairs candidates as input and returns asoutput a sub-list of candidates. In Line 3 we consider Qpthe set of all q trajectories that up to this point have beenidentified as possible candidates for being the most similarQ-trajectories to p. Then Line 4 calculates the lower andupper bounds (lowp and upp , respectively) of the trajectorysimilarity between p and q, using Observation 10. This isAlgorithm 1 Prunes top-K trajectory similarity candidate setInput: An integer K 0. A set candidates P Q.Output: A set Result candidates P Q that containsthe result of a top-K trajectory similarity query.1: function SORT PRUNING (K, candidates)2:for p ΠP (candidates) do in parallel3:Qp {q Q (p, q) candidates}4:(lowp , upp ) hausdorf f bounds(p, Qp )dp, ub p , low5:Qcpp Sort Qp , lowp , upp using lowpas keydp )dp lef t shif t array(low6:lowdp, u7:cut ptp f ind cut point(p, K, lowcpp )cp }\ p {(p, q) q Q8:candidates9:end for\ p Π (candidates) candidates\ p10:candidatesP11:cut pts p ΠP (candidates) cut ptp\cut pts)12:Result remove(candidates,13:return Result14: end function15: function HAUSDORFF BOUNDS (p, Qp )16:QM BRp {M BR(q) q Qp }17:for i {0, 1 . . . , Qp 1} do in parallel18:lowp [i] min d(M BR(p), QM BRp [i])19:upp [i] max d(M BR(p), QM BRp [i])20:end for21:return (lowp , upp )22: end function23: function FIND CUT POINT (p, K, lowp , upp )24:for i {0, 1, . . . , lowp 1}) do in parallel25:if upp [i] lowp [i] then26:cut ptp [i] 127:else28:cut ptp [i] 029:end if30:end for31:P f x cut ptp InclP f xSum(cut ptp )32:cut ptp min{i P f x cut ptp [i] K}33:return cut ptp34: end function35: return sort pruning(K, candidates)illustrated in Fig. 1 Step 1, where we can see that differentthread blocks are assigned different p query trajectories, andevery thread in a thread block is in charge of a different (p, q)trajectory pair. The first thread block is in charge of findingthe lower and upper bounds of the Hausdorff distance foreach of the pairs (p3 , q3 ), (p3 , q1 ) and (p3 , q7 ). Line 5 sortsthe arrays Qp , lowp , and upp , using the entries in lowp askeys; in this way we ensure that the premise of Theorem 7is satisfied. An example of this is shown in Fig. 1 Step 2,where we see that the pairs corresponding to the first threadblock have been sorted according to their lower bounds so

*#I)#.*#O4,4;45-#'4:6E64,-#F"J## I*#N2-7D#,74C-',&7D#F!J#.?#I #.?# .?# .?#IA# I))# I)B#!"# %&'(# #!"# %&'(#)#! 76"# #! 76"#*#! 76"#)#.?#[email protected]#! 76"# #!"# %&'%(#)%* ,)%" #%(-. #I*#!"# %&'(#*#! 76"#P#! 76"#@#! 76"#G#! 76"#?#! 76"#A# ! 76"#)B#0?#0*# J% E% &% &% E% E% E% J% J%N/"/O/*#%4/6:,:/"#% "B%% K&%KD%KI%KA%KH%KG% KIF% KII% KE%K&% E%(-. %%ACD% &CD% HCI% ECA% DCE% DCD% DCG% FCE% ECF% FCF% %AC8% ACG% &CJ% ECF% DCH% DCD% DCG% JCF% ICI% '4:6E64,-#F"J##I*#I)#IP#IA#I #[email protected]# I))# I)B#I?#I*#Q&9.##)"*# *"P# "P# P"?# @")# ?" # P"A# P"P# B"?# ?"B#R.# " # *"G# "A# P"P# ?"B# P"@# G"B# P"A# )")# ?"G# ,-.# /#KE, E:#-4' #0EH#5&7,#, -#,2.%-5#E:#E:'7-45E:L#&76-7#&M#0G#0*#0?#, -#%&9-7#;&2:65#.?#T 0:C%A%T 0:C%&%T 0:C%I%.*#.*#.*#.?#.?#.?#.?#.G#.G#I*#IP#I)#I #[email protected]#IA# I)B# I))#I?#I*#T 0:C%8%T 0:C%D%T 0:C%H%T 0:C%J%T 0:C%E%T 0:C%G% T 0:C%IF%!"# %8'%9,6:%" #%4;"% -,6"*%,6%#/4 % ,% *,67% #0#%[email protected]% &% E% J% E% &% &% &% E% E% E% J% J%N/"/O/*#%4/6:,:/"#% "B%% K&%KD%KI%KA%KH%KG% KIF% KII% KE%K&%L;#01%"0/M#4"-01% !B%! -#,74C-',&7D#.4E7#F.GHI*J# ,&7D#F!J# E% &% &%L;#01%"0/M#4"-01% !B% E%(-. %%ACD% &CD% HCI% ECA% DCE% DCD% DCG% FCE% ECF% FCF% %AC8% ACG% &CJ% ECF% DCH% DCD% DCG% JCF% ICI% ECJ%4;"P " %I%I%I%I%I%I%I%F%I%Q&9.##)"*# "P# *"P# @")#!?" #!P"?#!P"P# P"A# B"?#!?"B#Q2RP4;"P " % Q0#SR%*;3B%I%A%&%I%A%&%8%8%I%R.# " # "A# *"G# ?"B# P"@# P"P# P"A# G"B# )")# ?"G#4;"P "*%O4,4;45-#'4:6E64,-#F"J##(a) Steps 1 & 2Figure 1.I%I%F%A%I%(b) Steps 3 & 4Example run of the S ORT P RUNING step of Top-KaBT (Steps 1 to 4)Algorithm 1 Prunes top-K trajectory similarity candidate set36: function REMOVE (candidates, cut pts)37:n ΠP (candidates) 38:for i {0, 1, . . . , n 1} do in parallel39:of f set[i] {(pj , q) candidates pj pi } 40:B[2i] cut pts[i]41:B[2i 1] ΠP (candidates)[i] 42:Alter 1s0s[2i] 143:Alter 1s0s[2i 1] 044:end for45:Counts adjacentDif f erence(B)46:Stencil RLD(Counts, Alter 1s0s)47:P f xStencil ExclP f xSum(Stencil)48:for i {0, . . . , n 1} do in parallel49:if Stencil[i] 1 then50:P runed[P f xStencil[i]] candidates[i]51:end if52:end for53:return pruned54: end functionthat (p3 , q3 ) has smaller lower bound (whose value is 1.3)than (p3 , q7 ), which has 2.7 as a lower bound, and (p3 , q7 )in turn has a smaller lower bound than (p3 , q7 ), which hasa lower bound of 3.7. In Line 6 of Fig. 1, lowp is shifted1 entry to the left for memory coalescing in Line 23. Thereason for this is that, according to Theorem 7, we test ifMp,qi mp,qi 1 for every p and qi , so the value lowp [0]corresponding to mp,q0 is never used. Fig. 1 Step 3 showsthe left-shifting of Lowp . Notice how the first value (1.3) ofthe lower bounds array disappeared, and we added a 0.0 tothe right of the same array. Because of Theorem 7, this lastvalue we added to the right is never used. Line 7 finds thecut point associated with every p query trajectory using thelower and upper bounds of the trajectory similarity measure.This corresponds to Step 4 in Fig. 1 and Step 5 in Fig. 2.The function H AUSDORFF B OUNDS in Line 15, shown inStep 1 of Fig. 1, receives a trajectory p, and a list Qp withthe associated Q trajectory candidates, and finds lowp andupp that satisfy: lowp Hausd(p, q) upp . In Lines 17 to20, lowp and upp are computed in parallel for every q Qpusing Observation 10. This function does memory coalescingwhen writing the bounds of the MBRs back to the globalmemory because threads with consecutive identifiers writethe MBR bounds of trajectories with consecutive indexes.This function also achieves load balancing within threadblocks because the complexity of computing the MBRsdoes not depend on the trajectories themselves; therefore,all threads perform the same amount of work.Function F IND C UT P OINT in Line 23 receives as inputparameters a p P , an integer K 0, and the two arrayslowp and upp of the lower and upper bounds, respectively,and is in charge of finding the smallest K-cut point usingTheorem 7. After the parallel loop in Lines 24 through 30,an array cut ptp is obtained, which is shown in Fig. 1 Step4. There we see that the cut ptp boolean array has the value1 at position i if the corresponding pair has an index thatis a cut point, and 0 otherwise. For example, in the pairsassociated with the second thread block, the cut ptp entryassociated with the pair (p6 , q11 ) is 0 because 0.6 8.0.To find the smallest K-cut point for p, a parallel exclusiveprefix sum [18] over cut pt (which is the portion of the

!"# %&'%()* %,--,.%)* )/,0*1%23)/3%#4#5#*"6%"7% -#6#-8#%IA%9BC /:" %D9-#EC%6:5F%;%?%/:" "6%IJ%A%?%A%;%IK%G%G%;%?%?%?%?%L7 6#"D)FM%)6%"3#%"7",4%*:5N#-%7B%#4#5#*"6%)*%,44%I7 6#"%OP6%"3,"%/75#%N#B7-#%I)% K%A%@%/:" "6 %7 6#"%W%%A%&%[email protected]%@%;%Z%Z%I7:*"6%;%?%;%A%;%Z%Y4"#- [email protected]%?%@%?%@%?%@%!:N"-,/"%, O,/#*"%#4#5#*"6%)*% ,-,44#4%H%%[email protected]%%G%%&%?%%;%%/:" "6% %7 6#"% %?%K%Z%W%)6%:6# %"7%E* %372%5,*.%#4#5#*"6%"7%X## %)*%#,/3%I)%%9,-,44#4%-:*%4#*1"3% #/7 !F% A% A% A% J% J% J% J% J% K% K%V,",N,6#%/,* ) ,"#%D"F%% SA%ST%S?%S;%S&%SH% [email protected]% S?% SJ%SA%QB%,% ,)-%D # %Figure 2.Example run of S ORT P RUNING (Step 5) %& "-A"B&C9D&"4''&(& 756" 7)5 "E,"E-"[email protected]" %&'()*"!"!"#"!"!"#"#"#"!"!"34&56"%578&(%956":!;" ," ," ," -" -" -" -" -" @" @" 7%7 7 &"(7'?)?7%&":";"" .,"./".!".0".1".2" .!#" .!!" "-"GK5?H"2" GK5?H"!#"F)'7*"(7'?)?7%&" &%A" ," ," -" -" @" @" 7%7 7 &"(7'?)?7%&":";"" .,"./".0".1".-".,"34&56"%578&(%956":!;"Figure 3.Example run of S ORT P RUNING (Step 6)cut ptp array corresponding to p) is performed to obtain thearray P f x cut pt of Line 31; this is shown in Fig. 2 wherethe second thread block obtained the array [1, 2, 3, 4, 4].After this, every thread block finds the smallest index isuch that P f x cut ptp [i] K. In the case of the secondthread block, the first i that satisfies this condition is i 1because there is a 2 in the P f x cut pt portion of the secondthread block at position 1. This function performs memorycoalescing because threads with consecutive indexes accessadjacent memory locations in the cut ptp array. Also, allthreads perform the same amount of work.Function R EMOVE in Line 36 receives as input parametersthe array candidates with the candidate trajectory pairs(p, q), and an array cut pts of length Πp (candidates) (where Πp (candidates) is the projection on the left component (P ) of the tuples in candidates), one for every p candidate. This last array satisfies that cut pts[i] is the cut pointassociated with the i-th p trajectory in Πp (candidates).Lines 38 through 44 create an array B that contains theelements of cut pts of f set 1 in its even-indexed entries,and the elements {(pj , q) candidates pj pi } in itsodd-indexed entries i. In Fig. 2 we see in Step 5 that theelements of cut pts of f set 1 are B[1] 2, B[3] 5and B[5] 10. The idea behind creating B is to counthow many pairs (p, q) need to be preserved for every p.These same Lines 38 to 44, create another array Alter 1s0swith 1s in its even entries and 0s in its odd entries. Thisarray is used for run-length decoding [19]. Then Line 45calculates the adjacent difference array Counts satisfyingthat Counts[i] B[2i 1] B[2i]. Counts[2i 1] isthe number of Q candidates associated with pi that can bepruned away, while Counts[2i] indicates the number of Qcandidates associated with pi that cannot be pruned away.In Fig. 2 we see that in Step 5 Counts[0] B[1] B[0] 2 0 2, and Counts[1] B[2] B[1] 3 2 1.This means that 2 pairs associated with p3 (which is the0-th p candidate) cannot be pruned away, but 1 pair can bepruned away. Line 46 performs a run-length decoding overCounts (containing the counts of how many times eachelement occurs in the result of the run-length decoding)and Alter 1s0s (containing the elements that will be inthe result of the run-length decoding); this is to obtain thearray Stencil, which has a 1 at position i if and only ifcandidates[i] cannot be pruned, and a 0 at position i ifcandidates[i] can be safely pruned according to Theorem4.6. Lines 48 to 52 prune the spurious candidates by writinginto P runed only those elements of candidates located atpositions i such that Stencil[i] 1. In Fig. 3 we see thatthe candidates (p3 , q1 ), (p6 , q9 ), (p6 , q10 ) and (p6 , q11 ) hadassociated Stencil values of 0; therefore, they were pruned.V. E XPERIMENTS AND R ESULTSIn this section we describe the dataset, the hardware andsoftware environment, and the experiments used to compareTKSimGPU, when combined with Top-KaBT to reducecandidate sets against TKSimGPU itself and against a naı̈veexhaustive GPGPU search algorithm. The naı̈ve exhaustivesearch algorithm finds the Hausdorff distances between allpairs of (p, q) P Q, and then sorts those distances toselect the top K most similar trajectories in Q for everyp P.A. Datasets and experiment setupFor o

most similar travel trajectories to his own trajectories. These . similarity query is different from the top-K trajectory sim-ilarity query because the former takes as input parameters a trajectory query set P, a database Q, and a real number 0, and seeks to obtain for every p2P the