How to Catch when Proxies LieVerifying the Physical Locations of Network Proxies with Active GeolocationZachary WeinbergShinyoung ChoNicolas ChristinCarnegie Mellon [email protected] Brook [email protected] Mellon [email protected] SekarPhillipa GillCarnegie Mellon [email protected] of t users worldwide rely on commercial network proxies bothto conceal their true location and identity, and to control their apparent location. Their reasons range from mundane to security-critical.Proxy operators offer no proof that their advertised server locations are accurate. IP-to-location databases tend to agree with theadvertised locations, but there have been many reports of seriouserrors in such databases.In this study we estimate the locations of 2269 proxy servers fromping-time measurements to hosts in known locations, combinedwith AS and network information. These servers are operated byseven proxy services, and, according to the operators, spread over222 countries and territories. Our measurements show that onethird of them are definitely not located in the advertised countries,and another third might not be. Instead, they are concentrated incountries where server hosting is cheap and reliable (e.g. CzechRepublic, Germany, Netherlands, UK, USA).In the process, we address a number of technical challenges withapplying active geolocation to proxy servers, which may not bedirectly pingable, and may restrict the types of packets that can besent through them, e.g. forbidding traceroute. We also test threegeolocation algorithms from previous literature, plus two variationsof our own design, at the scale of the whole world.Commercial VPN services compete to offer the highest speed, thestrongest privacy assurances, and the broadest possible set of serverlocations. One of the services in this study advertises servers inall but seven of the world’s sovereign states, including implausiblelocations such as North Korea, Vatican City, and Pitcairn Island.They offer no proof of their claims. IP-to-location databases oftenagree with their claims, but these databases have been shown tobe full of errors [18, 38, 42]. Worse, they rely on information thatVPN providers may be able to manipulate, such as location codesin the names of routers [7].VPN services that consolidate their servers in a smaller numberof locations than they advertise can choose those locations forbetter performance, reliability, and reduced operational expenses.This gives them a competitive advantage over services that strivefor true location diversity. If they can manipulate IP-to-locationdatabases, they can still provide the appearance of location diversity.Many of a VPN service’s customers may well be satisfied withappearances. For instance, the IP-to-location database entry is moreimportant than the physical location for customers using VPNsto defeat geographic restrictions on online media streaming [2].However, for others the physical location can be essential. Westarted the investigation leading to this paper when we attemptedto use commercial VPN services for censorship monitoring, butcould not reproduce the observations reported by volunteers withina country known for censoring the Internet.In this paper, we apply active geolocation to check the advertisedlocations of VPN servers. Active geolocation estimates the locationof an Internet host by measuring packet round-trip times betweenit and other hosts in known locations. It has been demonstrated towork at the scale of a large country or small continent (e.g. China,Europe, and the USA), with varying levels of accuracy, dependingon how efficient the regional network is [8, 11, 16, 32]. However,it has not been thoroughly tested at the scale of the entire world,and, to our knowledge, it has only once before been applied tocommercial proxy servers [39].Using active geolocation, we can usually locate a VPN serverto within 1000 km2 , anywhere in the world. Our results are moreprecise in more densely connected regions and/or when landmarksare nearby, but even when we are uncertain about where a serveractually is, we can still disprove blatant inaccuracies in marketingclaims. For instance, if we know that a server is in Belgium, Netherlands, or Germany, but not which, that still proves it is not in NorthKorea. We tested 2269 servers operated by seven VPN services,CCS CONCEPTS Networks Network measurement; Network structure;KEYWORDSactive geolocation, virtual private networks, network proxiesACM Reference Format:Zachary Weinberg, Shinyoung Cho, Nicolas Christin, Vyas Sekar, and Phillipa Gill. 2018. How to Catch when Proxies Lie: Verifying the Physical Locations of Network Proxies with Active Geolocation. In 2018 Internet Measurement Conference (IMC ’18), October 31–November 2, 2018, Boston, MA, USA.ACM, New York, NY, USA, 15 pages. to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] ’18, October 31–November 2, 2018, Boston, MA, USA 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-5619-0/18/10. . . UCTION

IMC ’18, October 31–November 2, 2018, Boston, MA, USAFigure 1: The principle of multilateration. If something iswithin 500 km of Bourges, 500 km of Cromer, and 800 km ofRanders, then it is in Belgium (roughly).including five of the top 20 by number of claimed countries. At leasta third of all the servers we tested are not in their advertised country.2BACKGROUNDExisting methods for finding the physical location Internet hostscan be divided into two general classes. Passive methods collectlocation information from regional Internet registries, location information encoded in router hostnames, and private consultationwith individual ISPs [7], and produce a database mapping IP addresses to locations. These databases are widely used, but notoriousfor their errors [18, 38, 42], some of which are significant enoughthat they make the news [21].Active methods, on the other hand, rely on measurements ofpacket round-trip time between a target host, which is to be located,and a number of landmark hosts, which are in known locations.The simplest active method is to guess that the target is in thesame place as the landmark with the shortest round-trip time [8,35, 46]. This breaks down when the target is not near any of thelandmarks. The next step up in complexity is to estimate, for eachlandmark, the maximum distance that a packet could have traveledin the time measured, and draw disks on a map, bounded by thesedistances. The target must be in the region where the disks allintersect. This process is called multilateration. Figure 1 shows anexample: measurements taken from Bourges in France, Cromer inthe UK, and Randers in Denmark produce an intersection regionroughly covering Belgium.The central problem for network multilateration is that networkpackets do not travel in straight lines. Cables are laid on practicalpaths, not great circles. Network routes are optimized for bandwidthrather than latency, leading to “circuitous” detours that can addthousands of kilometers to the distance traveled [29, 31, 34]. Intermediate routers can add unbounded delays [32]. Distance and delaydo still correlate, but not in a straightforward way. Much researchon active methods focuses on increasingly sophisticated models ofthe delay-distance relationship [4, 12, 14, 20, 30, 31, 34, 35, 45]. Onecommon refinement is to assume a minimum travel distance forany given delay, as well as a maximum.Challenges of global geolocation. When both landmarks andtargets are in the same subcontinental region, sophisticated modelsimprove accuracy—if that region is Europe or the USA. On the otherZachary Weinberg et al.hand, for China, several papers report that simple models are moreaccurate [8, 11, 32]. They propose that simple models are morerobust in the face of severe congestion. Li et al. [32] specificallypoints out that a minimum travel distance assumption is invalid inthe face of large queueing delays at intermediate routers. A secondpossibility is that sophisticated models are more reliable when thereare more possible paths between landmarks and targets, as is thecase in Europe and North America, but not China [16]. A third is thatmodels tested on PlanetLab nodes [37] gain an unfair advantage dueto the generally better connectivity enjoyed by academic networks.In Section 5, we test four algorithms, covering a range of modelcomplexity, on hosts crowdsourced from all over the world. We alsofind that simple models are more effective, overall, and our data ismore consistent with the congestion explanation.Increasing the number of landmarks improves accuracy but alsoslows down the measurement process, since all of the landmarksmust send packets to the target and wait for replies (or vice versa).If they all do this simultaneously, they may create enough extranetwork congestion to invalidate the measurement [22]. Severalresearchers have observed that landmarks far away from the target are less useful, and proposed a two-stage process, in which asmall number of widely dispersed landmarks identify the subcontinental region where the target lies, and then a larger group oflandmarks within that region pin down the target’s position moreaccurately [11, 23, 26, 46].Challenges of geolocating proxies. Less than ten percent ofthe proxies we are interested in testing will respond to pings, andwe do not have the ability to run measurement programs on theproxies themselves. We can only send packets through the proxies,which means the apparent round-trip time to each landmark isthe sum of the round-trip time from the proxy to the landmark,and the round-trip time from our measurement client to the proxy.This is similar to the problem faced by Castelluccia et al. [5] whenattempting to geolocate botnet command-and-control servers, andwe adopt the same solution, as discussed further in Section 5.3.3ALGORITHM SELECTIONSince the proxies we are investigating could be spread all over theworld, we must find an active geolocation algorithm that will workat the scale of the whole world. We reimplemented four activegeolocation algorithms from earlier papers: CBG [20], Octant [45],Spotter [30], and an Octant/Spotter hybrid of our own invention.We did not have access to the original implementations, and we hadto fill in gaps in all their published descriptions. All the softwarewe developed for this project is open-source and available online.1Eriksson et al. [15] recommend considering external facts aboutwhere a server could plausibly be, such as “on land, and not inAntarctica.” We take this advice and exclude all terrain north of85 N and south of 60 S from the final prediction region for eachtarget and algorithm. Using the 2012 Natural Earth [36] map ofthe world, we also exclude oceans and lakes. We do not, however,exclude any islands, no matter how small or remote, because someof the proxy providers do claim to have servers on remote islands(e.g. Pitcairn).1

One-way travel time (ms)How to Catch when Proxies Lie350IMC ’18, October 31–November 2, 2018, Boston, MA, selin005 00010 00015 00020 00005 00010 00015 00020 00005 00010 00015 00020 000Distance from landmark (km)Figure 2: Example calibration scatterplots for CBG, (Quasi-)Octant, and Spotter.3.1Constraint-Based Geolocation3.3SpotterConstraint-Based Geolocation (CBG) is one of the oldest and simplest multilateration algorithms. It uses a linear model for the delaydistance relationship, limited by a “baseline” speed of 200 km/ms,or 23 c, which is approximately how fast signals propagate in fiberoptic cable. For each landmark, CBG computes a “bestline” fromthe calibration data, which is as close as possible to all of the datapoints on a scatterplot of delay as a function of distance, whileremaining below all of them, and above the baseline. This will bea speed slower than 200 km/ms, and will therefore give a smallerestimate of how far a packet could have gone in a given time. Eachlandmark’s bestline gives the maximum distance for a round-tripmeasurement to that landmark.The left panel of Figure 2 shows an example calibration for CBG.The blue dots are round-trip time measurements taken by oneRIPE anchor. The bestline (solid) is above the baseline (dotted); itpasses through exactly two data points, with all the others above. Itcorresponds to a speed of 93.5 km/ms—less than half the theoreticalmaximum. The “slowline” will be explained in Section 5.1.Spotter [30] uses an even more elaborate delay-distance model. Itcomputes the mean and standard deviation of landmark-landmarkdistance as a function of delay, and fits “a polynomial” to both.Unlike CBG and Octant, a single fit is used for all landmarks. Thepaper does not specify the degree of the polynomial, or the curvefitting procedure; we use cubic polynomials, fit by least squares,and constrain each curve to be increasing everywhere (anythingmore flexible led to severe overfitting in pilot tests).Spotter also uses a probabilistic multilateration method. It estimates the distance from each landmark to the target as a Gaussiandistribution, with mean µ and standard deviation σ given by thefitted curves. This produces a ring-shaped probability distributionover the surface of the Earth; the rings for each landmark are combined using Bayes’ Rule to form the final prediction region.The right panel of Figure 2 shows an example Spotter calibration.The solid line is the best cubic fit for the mean µ of the distancedelay relationship; dashed, dash-dot, and dotted lines are drawn atµ σ , µ 3σ , and µ 5σ respectively.3.2To separate the effect of Spotter’s probabilistic multilateration fromthe effect of its cubic-polynomial delay model, we also implementeda hybrid that uses Spotter’s delay model, but Quasi-Octant’s ringbased multilateration. The minimum and maximum radii of thering are set to µ 5σ and µ 5σ , respectively.3.4Quasi-OctantOctant elaborates on CBG in two ways. First, it estimates boththe maximum and the minimum distance to each landmark, anddraws rings on the map, not disks. Second, Octant uses piecewiselinear curves for both distance models. These are defined by theconvex hull of the scatterplot of delay as a function of distance, upto 50% and 75% of all round-trip times, respectively. Observationsbeyond those cutoffs are considered unreliable, so Octant uses fixedempirical speed estimates for longer round-trip times. The middlepanel of Figure 2 shows an example Octant calibration, with thesame data as the CBG calibration to its left. The convex hull isdrawn with solid lines and the fixed empirical speeds with dashedlines.Octant includes features that depend on route traces, such asa “height” factor to eliminate the effect of a slow first hop fromany given landmark. Since we cannot collect route traces (see Section 4.2), these have been omitted from our re-implementation, andwe call it “Quasi-Octant” to denote that change.4Quasi-Octant/Spotter HybridMEASUREMENT METHODFor all our experiments, we used the “anchor” hosts of RIPE Atlas [40] as landmarks. RIPE Atlas is a worldwide constellation ofhosts dedicated to Internet measurement, composed of “probes”and “anchors;” there are fewer anchors, but they are more convenient for use as landmarks. They are reliably available 24/7, theirdocumented locations are accurate, and they all continuously pingeach other and upload the round-trip times (RTT) to a publiclyaccessible database. At the time we began our experiments (July2016), there were 207 usable anchors; during the course of the experiment, 12 were decommissioned and another 61 were added.Figure 3 (left side) shows all the anchors’ locations. The majority

IMC ’18, October 31–November 2, 2018, Boston, MA, USAAnchorsZachary Weinberg et al.ProbesFigure 3: Locations of the RIPE Atlas anchors (left) and probes with stable IPv4 addresses, as of April 2018.are in Europe; North America is also well-represented. While thereare fewer anchors in Asia and South America, and only a few inAfrica, their geographic distribution is adequate—the most difficultcase for active geolocation is when all of the landmarks are faraway from the target, in the same direction [16, 34].4.1Two-phase measurementIt takes several minutes to ping all 250 of the anchors. Landmarksfar from the target do not contribute much useful information, aswe will discuss further in Section 5.2. We speed up the processwith a two-phase measurement, as proposed by Khan et al. [26]and others [11, 23, 46]. We first measure RTTs to three anchors percontinent, and use these measurements to deduce which continentthe target is on. We then randomly select and measure RTTs to 25more landmarks on that continent, from a list including all of theanchors, plus all the probes that have been online for the past 30days with a stable IPv4 address. These probes are shown in Figure 3(right side).Random selection of landmarks in the second phase spreads outthe load of our measurements, reducing their impact on concurrentexperiments [22]. Using stable probes as well as anchors spreadsthe load even in parts of the world where there are few anchors.We maintain a server that retrieves the list of anchors and probesfrom RIPE’s database every day, selects the probes to be used aslandmarks, and updates a delay-distance model for each landmark,based on the most recent two weeks of ping measurements availablefrom RIPE’s database. Our measurement tools retrieve the set oflandmarks to use for each phase from this server, and report theirmeasurements back to it. Some of the landmarks have both IPv4 andIPv6 addresses, but the commercial proxy servers we are studyingoffer only IPv4 connectivity, so the server resolves the landmarks’hostnames itself and sends only IPv4 addresses to the tools.4.2Measurement toolsCommercial proxy providers aggressively filter traffic through theirproxies. Of the VPN servers we tested, roughly 90% ignore ICMPping requests. Similarly, 90% of the default gateways for VPN tunnels (i.e. the first-hop routers for the VPN servers) ignore pingrequests and do not send time-exceeded packets, which means wecannot see them in a traceroute either. Roughly a third of theservers discard all time-exceeded packets, so it is not possible totraceroute through them at all. Some servers even drop UDP andTCP packets with unusual destination port numbers.In short, the only type of network message we can reliably useto measure round-trip time is a TCP connection on a commonlyused port, e.g. 80 (HTTP). We implemented two measurement toolsthat use this method to measure round-trip times to each landmark.Command-line. For measurements of VPN proxies’ locations(Section 6), we used a standalone program, written in Python andC. It can take measurements either directly or through a proxy, andit can process a list of proxies in one batch.This tool uses the POSIX sockets API to make TCP connections.It measures the time for the connect primitive to succeed or report “connection refused,” and then closes the connection withoutsending or receiving any data. We verified that connect consistently returns as soon as the second packet of the TCP three-wayhandshake arrives (i.e. after a single round-trip to a landmark) onboth Linux and NetBSD. (Linux was used as the client OS for allthe measurements of VPN proxies; some pilot tests involved clientsrunning NetBSD.) If a connection fails with an error code other than“connection refused,” the measurement is discarded. “Network unreachable” errors, for instance, originate from intermediate routers,so they do not reflect the full round-trip time.Web-based. For algorithm validation (Section 5) we crowdsourced hosts in known locations from around the world. We couldnot expect volunteers from the general public, or Mechanical Turkworkers, to download, compile, and run a command-line tool, sowe implemented a second measurement tool as a Web application.Anyone can access the website hosting this application,2 and itrequires no “plug-ins” or software installation. It presents a livedemonstration of active geolocation, displaying the measurementsas circles drawn on a map, much as in Figure 1. After this demonstration is complete, it offers an explanation of the process, andinvites the user to upload the measurements to our database, if theyare willing to report their physical location.The price of user-friendliness is technical limitations. Web applications are only allowed to send well-formed HTTP(S) messages; wecannot close connections immediately upon establishment, withoutsending or receiving any data, as the command-line tool does.In principle, web applications are not allowed to communicatewith arbitrary hosts, only with the server hosting their website [24].However, this rule has a loophole. When a web application attemptsto communicate with a server that isn’t hosting its website, thebrowser will send an HTTP request, but won’t return a successful2

How to Catch when Proxies LieIMC ’18, October 31–November 2, 2018, Boston, MA, USAMeasurement toolRound trips12250BrowserChrome 68Firefox 52Firefox 61Edge 17500One-way travel time (ms)500Browser750CLIChrome 68Firefox 52One-way travel time (ms)One-way travel time (ms)750Round trips12250020001000005 00010 000015 0000Distance (km)5 0000uesReqtlloSTor RACKentHeBadK-ACCli400SYN4.3clientconnectreturns5 00010 000Distance (km)Figure 6: High outliers removed fromFigure 5. The dashed line has the sameslope in both figures.application will measure the time for either one or two round-trips,and we can’t tell which.serverN15 000Figure 5: Comparison of four browsers running on Windows 10. 000Distance (km)Figure 4: Comparison of the CLItool with the web application in twobrowsers, all running on Linux.Chrome 68Firefox 52Firefox 61Edge 173000fetchreturnsFigure 7: Both measurement tools make TCP connections toport 80 on each landmark. The CLI tool can use the low-levelconnect API; the web application must use the higher-levelfetch API. We instruct fetch to send encrypted (HTTPS)traffic to the usual port for unencrypted HTTP, forcing a protocol error. The CLI tool reliably measures one round-triptime; the web application measures one or two round-trips,depending on whether the landmark is listening on port 80.response unless the server allows it, using special HTTP responseheaders. Errors are still reported. Since we only care about thetime to connect, we make a request that we know will fail, andmeasure the time to failure. Ideally, we would connect to a TCPport that was not blacklisted by any VPN provider, and was closed(not blackholed) on all the RIPE Atlas nodes we use, but there is nosuch port.Instead, the web application makes encrypted (HTTPS) connections to the usual TCP port for unencrypted HTTP (80). This willfail after one round-trip if the landmark isn’t listening on port 80.However, if it is listening on port 80, the browser will reply tothe SYN-ACK with a TLS ClientHello packet. This will trigger aprotocol error, and the browser will report failure, but only after asecond round-trip. Thus, depending on whether the landmark islistening on port 80 (which depends on the version of the RIPE Atlasnode software it is running; we cannot tell in advance) the webTool ValidationFigure 7 shows the abstract difference in the network traffic generated by the two tools. Figure 4 compares the round-trip timesmeasured by the command-line tool and the web application running under two different browsers, all three on a computer in aknown location running Linux, to a collection of landmarks as described in Section 4.1. We manually partitioned the measurementstaken by the web application into groups suspected to be one roundtrip and two round trips, and estimated the distance-delay relationship for each by linear regression, shown with black lines and gray95% confidence intervals. The slope of the two-round-trip line is1.96 times the slope of the one-round-trip line; adjusted R 2 (considering both lines as a single model) is 0.9942. After accounting forthe effects of distance and whether we measured one or two roundtrips, ANOVA finds no significant difference among the three tools(two additional degrees of freedom, F 0.8262, p 0.44) which isa testament to the efficiency of modern JavaScript interpreters.Figure 5 compares the round-trip times measured by the webapplication running under four additional browsers, on the samecomputer that was used for Figure 4, but running Windows 10instead. (The command-line tool has not been ported to Windows.)Measurements on Windows are much noisier than on Linux. Wecan still distinguish a group of one-round-trip data points and agroup of two-round-trip data points, but there is a third group, “highoutliers,” separately shown in Figure 6 so that Figures 4 and 5 canhave the same vertical scale. The diagonal dashed line on Figures 5and 6 has the same absolute slope. The high outlier measurementsare much slower than can be attributed to even two round-trips,and their values are primarily dependent on the browser they weremeasured with, rather than the distance.Excluding the high outliers, the remaining data points for Windows can also be modeled by a division into groups for one or tworound-trips, but not as cleanly as on Linux. The ratio of slopes is

IMC ’18, October 31–November 2, 2018, Boston, MA, USAVolunteersZachary Weinberg et al.MTurk workersA1.00Empirical CDFFigure 8: Locations of the crowdsourced hosts used for algorithm validation, with volunteers on the left and Mechanical Turkworkers on the 0005 00010 00015 00020 000Distance from edge to location (km)05 00010 00015 00020 000 0.00Distance from centroid to location (km)0.250.500.751.00Area of region / Earth land areaFigure 9: Precision of predicted regions for crowdsourced test hosts.2.29, adjusted R 2 0.8983, and ANOVA finds the model is significantly improved by considering the browser as well (three moredegrees of freedom, F 13.11, p 6.1 10 8 ). Equally concerning,if we combine the two models, we find that the operating systemhas a significant effect on the slopes of the lines (four additionaldegrees of freedom, F 693.56, p 2.2 10 16 ) and the regressionline for two round-trips measured on Linux (t 0.03375d 45.52,distance in km, time in ms) is about the same as the line for oneround-trip measured on Windows (t 0.03288d 49.92).In section 5, we will speak further of how these limitations affectour assessment of which algorithm is most suitable for estimatingthe location of a proxy that could be anywhere in the world.5ALGORITHM TESTINGIn order to test our geolocation algorithms on hosts they hadn’t beencalibrated with, we crowdsourced a second set of hosts in knownlocations.3 40 volunteers, recruited from a variety of mailing listsand online forums, and another 150 paid contributors, recruited viaMechanical Turk for 25 each, provided us with the approximatephysical location of their computers (rounded to two decimal placesin latitude and longitude, or roughly 10 km of position uncertainty)and a set of round-trip times to RIPE Atlas anchors and probes,using the Web-based measurement tool described in Section 4.2.Their self-reported locations are shown in Figure 8. Like the RIPE3 Thisstudy was approved by our university’s IRB.anchors, the majority are in Europe and North America, but wehave enough contributors elsewhere for statistics.Our priority was to find an algorithm that would always includethe true location of each host in its predicted region, even if thismeant the region was fairly imprecise. To put it another way, wheninvestigating the locations of commercial proxies, we want to becertain that the proxy is where we say it is, even if that means wecannot assure that it is not where the provider says it is.In figure 9, panel A, we plot an empirical CDF of how far outsidethe predicted region each true location is, for each of the fouralgorithms. This is a direct measure of each algorithm’s failure tolive up to the above requirement. None of the algorithms are perfect,but CBG does better than the other three, producing predictionsthat do include the true location for 90% of the test hosts, and are offby less than 5000 km for 97% of them. Hybrid and Quasi-Octant’spredictions miss the mark for roughly 50% of the test hosts, but theyare off by less than 5000 km for roughly 90%. Fully half of Spotter’spredictions are off by more than 10 000 km.In panels B and C of Figure 9, we look into why the predictionsmiss the true region. Panel B shows that the distances from thecentroid of each algorithm’s predictions, to the true locations, areabout the same for all four algorithms, and panel C shows that CBGproduces predictions that are much larger than the other three.We conclude that none of the algorithms can reliably center theirpredicted region on the true location, but CBG’s predictions are

How to Catch when Proxies LieIMC ’18, October 31–November 2, 2018, Boston, MA, USA20 00015 000Baseline5 000020 000Bestline10 0005 0000012345 Estimated/true distance ratioFigure 10: CBG bestline and baseline e

Commercial VPN services compete to offer the highest speed, the strongest privacy assurances, and the broadest possible set of server locations. One of the services in this study advertises servers in all but seven of the world's sovereign states, including implausible locations such as North Korea, Vatican City, and Pitcairn Island.