Detecting violations of application-level end-to-end connectivity on the Internet is of significant interest to researchers and end users; recent studies have revealed cases of HTTP ad injection and HTTPS man-in-the-middle attacks. Unfortunately, detecting such end-to-end violations at scale remains difficult, as it generally requires having the cooperation of many nodes spread across the globe. Most successful approaches have relied either on dedicated hardware, user-installed software, or privileged access to a popular web site. In this paper, we present an alternate approach for detecting end-to-end violations based on Luminati, a HTTP/S proxy service that routes traffic through millions of end hosts. We develop measurement techniques that allow Luminati to be used to detect end-to-end violations of DNS, HTTP, and HTTPS, and, in many cases, enable us to identify the culprit. We present results from over 1.2M nodes across 14K ASes in 172 countries, finding that up to 4.8% of nodes are subject to some type of end-to-end connectivity violation. Finally, we are able to use Luminati to identify and measure the incidence of content monitoring, where end-host software or ISP middleboxes record users' HTTP requests and later re-download the content to third-party servers.
Luminati is the paid HTTP/S proxy service that routes traffic via exit nodes. Clients of Luminati can use an API to automate requests, as well as express preferences over which Hola client will be selected to route their traffic. Luminati clients are charged on a per-GB basis, and all Luminati traffic is first routed via a Hola server before being forwarded to a Hola user's client.
We used Luminati to explore an alternative approach to detecting end-to-end connectivity violations in edge networks, which allows us to achieve measurements from nearly one million end hosts simultaneously without requiring users to install our software or hardware. By using Luminati, we can route HTTP/S traffic via many of the Hola nodes, and gain visibility into their networks.
Using Luminati, we demonstrate how a large-scale HTTP/S proxy service can be used to measure end-to-end connectivity violations in DNS, HTTP, and HTTPS. We develop techniques that allow us, in most cases, to identify the party responsible for the violations (i.e., the user’s DNS resolver, an ISP middlebox, software on the user’s machine, and etc.). This allows researchers to conduct measurements at the scale of approaches deployed by popular web sites, and avoids the overhead of having to convince users to install custom software or hardware. (For more details, you can visit to https://luminati.io)
Below, we make our four kinds of datasets (NXDOMAIN Hijacking, HTTP Content Modification, SSL Certificate Modification, and Content Monitoring) public. For more details regarding to our methodologies and results, please take a chance to look our paper
To detect NXDOMAIN Hijacking, we make the exit nodes to issue DNS resolution queries to our DNS server, and deliberately return a NXDOMAIN response to see whether they receive a NXDOMAIN response or content.
Using this methodology, we measured a total of 753,111 unique exit nodes from 167 countries and 10,197 ASes. We found that these exit nodes are configured to use a total of 33,446 unique DNS servers. We observed that 717,311 of the exit nodes (95.2%) do not experience NXDOMAIN hijacking, but the other 35,800 exit nodes (4.8%) have their response intercepted.
Name | Type | Size | SHA-256 Hash (Uncompressed) |
---|---|---|---|
nonnx-domain-list.txt | txt | 65 MB | ca73419fadb2f483109baaad2c01cf143bc8caa3fadc793531b7f21323d6ed3b |
nx-domain-list.txt | txt | 4.7 MB | 36915856aacb670a7fab5090af3238df2c25d16c5ca823e9a207adab8ff3c717 |
dataset-description.txt | txt | 767 B |
We simply fetch content from our Web server via an exit node, and check whether the content we received is the same as what we sent. For this experiment, we fetch four different pieces of content through each exit node: a 9KB HTML page, a 39KB JPEG image, a 258KB unminified Javascript library, and a 3 KB un-minified CSS file.
Using Luminati, We measured 49,545 exit nodes in 12,658 ASes across 171 countries. We detected HTML content modification for 472 exit nodes (0.95%), image modification for 694 (1.4%), JavaScript modification for 45 (0.09%), and CSS modification for 11 (0.002%).
Name | Type | Size | SHA-256 Hash (Uncompressed) |
---|---|---|---|
content_nonmodification_list.txt | txt | 11 MB | 9f68a3f88a7fb5a5898b6ee3f010c0e1579e27841110991235384a61146e7f59 |
content_modification_list.txt | txt | 403 KB | 1119110d12b0c5264dc4ca1b56cce1cddf5949891cbd04bab55caf083f7ea028 |
dataset_description.txt | txt | 744 B |
20160504,82.132.244.0/22,29180,gb,great britain,img,13236,444
20160505,82.132.224.0/22,29180,gb,great britain,img,13236,256
20160505,82.132.236.0/22,29180,gb,great britain,img,13236,472
20160505,82.132.224.0/22,29180,gb,great britain,img,13236,396
20160505,82.132.236.0/22,29180,gb,great britain,img,13236,472
We used the HTTP CONNECT method with the super proxy, which tunnels all TCP port 443 traffic between the exit node and our measurement client, including the TLS handshake. We completed a TLS handshake and record the SSL certificates presented; we then terminated the connection (we do not actually download any content). As certificate replacement may target individual web sites, we chose three different classes of sites to test:
Name | Type | Size | SHA-256 Hash (Uncompressed) |
---|---|---|---|
certs-test-fail.txt | txt | 15 MB | f8de86cf1835d47d5d8e9b176da106246e344e2dd11514819ad0b5c3ec7b24f1 |
certs-test-ok.txt | txt | 407 MB | 31999d88d7cad333213c38db60f665058a6f4b237f2cbb5fc883c1c34e6c2955 |
dataset-description.txt | txt | 622 B |
Another concerning form of end-to-end violation is content monitoring, or cases where middleboxes are silently observing content that users are downloading for the purpose of scanning content or otherwise controlling access. While content modification is easy to detect (e.g., via block pages), content monitoring is significantly more difficult to detect, as there is (by definition) no change to the content itself. However, we discovered we can detect certain types of content monitoring based on unexpected requests arriving at our measurement server.
From our measurement, we measured a total of 747,449 exit nodes, and observed that 11,234 (1.5%) of them resulted in multiple, unexpected requests. These unexpected requests came from 424 unique IP addresses that were different from the exit nodes.
Name | Type | Size | SHA-256 Hash (Uncompressed) |
---|---|---|---|
monitoring-webserver.txt | txt | 191 MB | 6aefe6ab1425ae0cbefb6542e59a264a08188520536139d2a046d5b7ee142d83 |
dataset-description.txt | txt | 342 B |
Do you have any questions, comments or concern? Feel free to send us an email to Taejoong Chung