While the US government is giving ISPs free rein to track their customers’ Internet usage for purposes of serving personalized advertisements, some Internet users are determined to fill their browsing history with junk so ISPs can’t discover their real browsing habits.
Scripts and browser extensions might be able to fill your Web history with random searches and site visits. But will this actually fool an ISP that scans your Web traffic and shares it with advertising networks?
Electronic Frontier Foundation Senior Staff Technologist Jeremy Gillula is skeptical but hopes he’s wrong. “I’d love to be proven wrong about this,” he told Ars. “I’d want to see solid research showing how well such a noise-creation system works on a large scale before I trust it.”
Steve Smith of Cambridge, Massachusetts contacted Ars and Gillula after our recent article about how the US Senate vote to eliminate ISP privacy rules affects users and what Internet users can do to hide their browsing history. He’s a subscriber to this browser pollution approach.
“Perhaps more constructively than using a VPN or Tor, fill up your monthly bandwidth allotment with data pollution,” Smith wrote to us. “You’re already paying for the bandwidth, so use it all if your ISP is going to sell your private data. This has the dual benefits of obscuring your actual browsing habits, and, if enough people adopt this practice, discouraging ISPs from selling private data.
“I’ve written a Python class to do this for my household—it crawls for links it finds using random word searches—and have shared the code,” he continued. Smith’s code is available on GitHub. Internet users often have to worry about data caps, but Smith set the default rate to use 50GB a month, or about five percent of a 1TB data cap.
Smith’s “ISP Data Pollution” project isn’t the only such effort. For instance, there’s a project called “RuinMyHistory” that opens a popup window that cycles through different websites and a browser plugin called Noiszy designed to “create meaningless Web data” by visiting various websites.
Browsing data sensitive even if surrounded by noise
A big challenge for attempts to pollute browsing history is that computers are extremely good at finding patterns, even when the data you want to hide is surrounded by a huge number of random data points. This is the kind of problem that “big data” systems are built to solve.
Gillula believes that data pollution systems may not be sophisticated enough to fool ISPs.
“In the end, it turns into a game of statistical cat-and-mouse between you and your ISP: Can they figure out how to separate the signal from the noise?” Gillula wrote in the e-mail thread with Ars and Smith. “I think ISPs will have a lot more resources (money and smart engineers who will be paid a lot) to try to figure out how to do that—way more resources than any individual or small open source project will.”
Browser noise also doesn’t eliminate the existence of sensitive browsing.
“Some information is sensitive even if it’s surrounded by noise,” Gillula wrote. “Imagine if hackers targeted your ISP, your browsing history was leaked, and it showed you visiting specific controversial websites (Democratic websites when you live in a Republican town or vice versa, or maybe looking for a divorce lawyer). Even if that was surrounded by noise, it would be very hard to get the sort of noise that would give you plausible deniability.”
This type of browser pollution system “might work for a bit,” but “if it becomes widespread then ISPs will start throwing resources at solving it,” Gillula wrote.