What Traffic is on a TOR Relay?

[index] [26,191 page views]

I'd long heard about the The Onion Router (Tor), but really had no reason to use it myself.¹ But then I was reading the Wired story Vanish about the attempt of Evan Ratliff to disappear in the digital age and Tor was mentioned as a tool he used to protect the secret of his location.² Intrigued, I decided to finally take a look. In a nutshell, nothing says it better than Tor's website:

Tor protects you by bouncing your communications around a distributed network of relays run by volunteers all around the world: it prevents somebody watching your Internet connection from learning what sites you visit, and it prevents the sites you visit from learning your physical location.

Joining Tor

So I installed the software and took a peek. At first, it's only setup to provide the user access to the Tor network. But within the configuration options is a setting allowing you to enable your computer as a Tor relay. Now instead of merely being a point of entry into Tor, your computer is part of the distributed network allowing other people to have anonymous access to the Internet through you. Further configuration allows the relay to restrict the type of resources it will offer - HTTP, HTTPS, POP3, IMAP, SMTP, IRC, chat applications and custom services.

Feeling like I was part of the greater good of the Internet, I let the relay run and didn't give it another thought. After about a week, the security side of my brain began to have some doubts. Technically, my computer was accessing the Internet on behalf of the anonymous user on the distant end. As mentioned before, I was content with the notion that my computer is allowing the politically oppressed, dissidents or whistleblowers a mechanism to get or pass information safely. But deviants would obviously use the system, too. What if their criminal traffic was construed as my own? I had to know what was going through Tor and specifically through me.

Watching Tor

First, I didn't want to end up capturing my normal network traffic, so I needed an isolated sandbox. I turned to VirtualBox for a free virtualization environment.³ The sandbox was setup simple with just TOR and WireShark installed.⁴ There are plenty of options for monitoring the packets, but as may be obvious, I'm a fan of free and WireShark is a great bit of software that can snarf up quite literally everything on the ether.

With all the software installed, it was time to get things running. I had no interest in decrypting SSL traffic or watching IRC and instant messages so only HTTP, POP3 and SMTP were enabled on the TOR relay. I needed to throw a quick filter on the promiscuous monitoring interface so I applied the rule host 192.168.1.20 and (port 25 or port 80 or port 113) to limit the captured packets and logged them to a file.

Now obviously this quick view isn't a very scientific approach ... it's only a 12 hour slice of time. But in that short span, I collected just under 1GB of data in a collection of about 1.2 million packets. So what's on the wire?

Looking at Tor Packets

The most obvious security vulnerability to Tor is the distant user themselves. Tor successfully does it's job of anonymizing the origin of the query. Every link through the Tor network from the originator to the final relay is appropriately encrypted separately by each relay. But there are two fail points 1) the unencrypted packets from the final relay to the destination and 2) the scruples of final relay operator.

Addressing the first point, no matter how many anonymizing tools a user employs, or how well they are put into play, that same user lets the cat out of the bag when their web posts, emails or chats leave traces back to themselves. Traces can include slipping up and including their real name, a common nickname or other allusions to themselves in plaintext. Additionally, many protocols give away information like usernames and passwords - all of which will be in the unencrypted traffic stream (and was).

The second point is touched on by the very existence and nature of this analysis. For example, although safe to assume they are there, there is no way to know definitively that plaintext username and password combinations were present without having seen them myself. So with that example, the simple act of performing this analysis violates the principle of trust on the Tor network. Obviously there is no guarantee beyond my word that my collected packets were destroyed with sDelete at the conclusion of the study.⁵ But trust in the final relay point is the weakest link of Tor. Any single operator of a Tor relay may in fact be monitoring and logging all of the node's outbound traffic - but perhaps doing so in a persistent fashion to target accounts or ferret out patterns to put names to the anonymous distant end.

I have no intention of posting the content of the packets collected during this exercise. But as I mentioned earlier, my curiosity towards continuing to be a Tor relay was what sites would my computer be accessing on behalf of others?

Tor's Connections

It would seem a simple exercise to simply reverse lookup the IP addresses collected in the WireShark logs. However, an IP address is not a conclusive indicator of the ultimate website being accessed given the ability to host a multitude of virtual servers in a shared hosting environment. Therefore, it was necessary to parse out the host field from the HTTP packets themselves. WireShark's Statistics and HTTP analysis feature actually roots out this information but does not make it exportable. For this task, I turned to a LUA script made available from the "Moose and Squirrel Files" which parse the logs and create a simple IP address and hostname listing.⁶ A slight modification was necessary to remove the outer DO and END statements to the LUA script in order to make the WireShark parser compile it.

In the short amount of time that I ran the Tor relay, the LUA script found and extracted 2561 distinct hosts from which users made HTTP GET requests from (though they hit some of them a lot). From that list, some of the requests were likely ancillary image pulls from the same site with pieces spread across a load-balancing server which I guess is demonstrated by this small excerpt:


cs4733.vkontakte.ru
cs1577.vkontakte.ru
cs1433.vkontakte.ru
cs1772.vkontakte.ru
cs1533.vkontakte.ru
cs1929.vkontakte.ru
cs4562.vkontakte.ru
cs4570.vkontakte.ru
cs4575.vkontakte.ru
cs4579.vkontakte.ru
cs1261.vkontakte.ru
cs4460.vkontakte.ru
cs4475.vkontakte.ru
cs1063.vkontakte.ru
cs4109.vkontakte.ru
cs4230.vkontakte.ru
cs4275.vkontakte.ru
cs4280.vkontakte.ru
cs4286.vkontakte.ru
cs4131.vkontakte.ru

And it stands to reason that users were not accessing ad servers directly for the "joy" of browsing ads. We all know the very nature of cross-linking within websites means material for a single site may be pulled from all over. So I engaged in another activity to see how the accesses broke out - I accessed every single web address that was collected (not the URL, just the host).⁷

This activity resulted in a highly generalized stratification of sites into a few broad categories:

Ads - this comprises ad servers, not pages offering ad services
Blog / Forum
Business - I roll up company pages, shopping and any sort of e-commerce here
Email - this also includes instant messaging or anything where I could collect personal communication data
File Downloading - sites here include both file hosting, torrent trackers or (probably) hacked hosts that are serving files
Finance - any sites related to personal finances like on-line banking or investment
Gambling
Games - these sites include on-line games or embedded games (like Zynga for FaceBook)
General Info - pages that host news or non-blog types of articles
Jihad
Malware - pages that flagged as malicious to either FireFox or embedded network security applications
Media - sites hosting on-line media like flash music or movies (YouTube)
Porn
Search
Security
Unknown

Tor Traffic

HTTP

On a positive note, it was clearly evident that Tor was being used for dissident unrest where the oppressed had access to the outside. I saw approximately twelve different webpages with traffic for sites with such themes as "Free Tibet" along with a number of other anti-China themed sites which would otherwise be inaccessible behind the great red firewall.⁸ A large number of sites were promoting freedom of information for Persians and featured a mixture of both Farsi and English. I couldn't read the Russian pages or the many Southwest Asian pages but visually they carried the forum design motif so people were obviously using Tor in order to make postings without being able to be traced.

There's a flip side to that coin. Apparently during that time frame, my computer also helped possible terrorists communicate. My logs showed the host http://alm0sul.co.cc corresponded to the Al-Mosul Islamic Network and at the time of viewing, it was clearly militant in nature. For those that know me and my former life in the Army, I have no interest in supporting militants as they rally to kill soldiers.^{9, 10}

One interesting type of site that was frequently browsed to was thematically a "what's my IP" type of page which tells the distant end who their final relay is. Perhaps just as I was curious about what they were up to, they were curious who was relaying for them. This can be both good and bad - (good for them) they can check whether or not the relay is someone who may be monitoring them like a Chinese censor or (bad for me) it allows them to know who I am for potential attack.

Also troubling were the 5% traffic loads to porn sites which were not simply softcore pages. There was seriously hardcore, kinky material and a lot of gay porn getting accessed anonymously through my relay. Just remember, if you run a Tor relay, YOUR computer is accessing these sites. My concern immediately went to whether someone was using my computer to access something illegal like child pornography. Somehow I doubt saying "my computer was just a Tor relay" is going to be a valid legal defense. Likewise, the RIAA or MPAA could easily come down on you for being the visible access point for file sharing and the graphs show at least 50% of the traffic is oriented around file sharing.

POP

I wasn't surprised that the POP3 traffic was small with only 17,036 bytes captured. These days, I would expect savvy people to be using IMAP and SSL connections instead of plaintext POP. But 20 different POP3 servers were accessed in plaintext and WireShark captured the credentials for each user. In the study's span of only twelve hours, 30 distinct e-mail accounts left their plaintext POP login information in the capture files.

SMTP

I expected more traffic from SMTP than the mere 363,977 bytes to a single server. With the promise of anonymous sourcing, I figured the SMTP traffic would show a staggering amount of spam. I did expect to see more outbound servers than just 85.222.111.211 but perhaps the large amount of webmail servers found in the HTTP traffic are indicative that most people are not using SMTP for regular email. When I googled this IP address, I found numerous references to downloadable blacklists that included this address with a description field indicating it was a spam relay. It's also likely that most people just don't elect to relay SMTP and those that do have highly restrictive filters in place to limit spam so the malcontents simply don't use Tor for spamming.

Conclusion

Put simply, I don't run a Tor relay anymore. I don't want to be responsible for illegal content, get sued for file sharing or facilitate terrorists. But to be a relay operator pretty much requires taking the bad along with the good unless I wanted to put the time in to filter the relayed traffic. As for being a Tor user, the exercise in monitoring all the packets demonstrated how easy it is for anybody to perform session recreation against me. Just like any tool, Tor has its weaknesses and a person should understand they can only rely on Tor for the specific purpose of masking their origin but should not be lulled into the illusion that Tor completely protects their privacy. Mentioned earlier, Tor describes its own abilities in that "it prevents somebody watching your Internet connection from learning what sites you visit, and it prevents the sites you visit from learning your physical location." This statement is true up until you, the user, give yourself away. It's evidently important to not simply use Tor for all of your traffic - especially things like personal finance or general communication. Eventually, an unscrupulous relay operator would be able to collect enough information to possibly begin discerning particular patterns that could root out the anonymous user's real identify and origin. A different project would be to see how many times its possible for a relay operator to identify the anonymous user from simple complacency.

Notes

Tor: Anonymity Online, accessed December 2009 from Tor Project at http://www.torproject.org/
Ratliff, Evan. Writer Evan Ratliff Tried to Vanish: Here’s What Happened, accessed December 2009 from Wired at http://www.wired.com/vanish/2009/11/ff_vanish2/
VirtualBox, accessed December 2009 from VirtualBox at http://www.virtualbox.org/
WireShark, accessed December 2009 from WireShark at http://www.wireshark.org/
Russinovich, Mark. sDelete, accessed December 2009 from Microsoft at http://technet.microsoft.com/en-us/sysinternals/bb897443.aspx
Wireshark Scripting – Extracting HTTP Host Headers, accessed December 2009 from The Moose and Squirrel Files at http://networknerd.wordpress.com/2008/10/01/wireshark-scripting-extracting-http-host-headers/
I pulled a total rookie mistake here and hosed one of my computers browsing to the sites. Since the VM was operating so slowly, I browsed from a normal laptop and one site completely killed the computer with malware that visually was reminiscent of some known NetSky Worm symptoms ... but this was a far more advanced variant that bypassed both Norton Internet Security, Anti-Virus and the malicious code detectors. It was able to infect my host instantly without any intervention from myself and proved to be completely unremovable. So non-1337 of me.
Hermida, Alfred. Behind China's Internet Red Firewall, accessed December 2009 from BBC at http://news.bbc.co.uk/2/hi/technology/2234154.stm
Vea, Matthew. Photographs of OIF III, accessed December 2009 from VnutZ.com at http://www.vnutz.com/articles/Photographs_Of_OIF_III
Vea, Matthew. Iraqi Referendum: Photographing 15 October 2005, accessed December 2009 from VnutZ.com at http://www.vnutz.com/articles/Iraqi_Referendum_Photographing_15_October_2005