Home | Blog | Contact

Surfing On The Pseudoanonymous Web


For as long as I can remember I've tried to minimize uncontrolled data and metadata emissions from my web activity. After many years of trying, I've come to realize that I won't be able to stop all emission, but regardless I continue to enjoy its pursuit as an intellectual exercise. I've used common privacy tools and techniques like browser ad-blockers, unique email addresses, opt-out settings, as well as other steps you're likely familiar with. Within the information security community these steps aren't necessarily considered to be tinfoil-hat-level paranoia behavior, but are rather basic steps required to increase your privacy on the web in 2022.

Reduction of emission alone won't be sufficient, because inevitably a limited amount of data will slip through your filters. Thus you also need to focus on degrading the quality of data you emit in order to reduce its overall signal quality. To understand how to do this effectively, we must understand how modern analytics companies identify your behavior and sell it to their custoemrs. To grossly oversimplify what they do, analytics companies attempt to find and group your behavior into clusters. In order to do this, they rely on high quality data markers. For example, if you visit a website about nursing techniques for new born babies, then you must be a parent of a newborn baby or are expecting to have a baby soon. In order to successfully group you into known clusters, analytics companies must therefor find high quality data points that they can use to positively correlate disparate data points together. One of the most common data points used today are stable user identifiers. These identifiers are used across various services or websites that you visit or use e.g., your email address and device ID can be used to tie identities together across multiple services into a single profile. Ergo, if you want to reduce the probability of successful correlation, you must reside within the noise floor of behavior correlation algorithms as best as you can.

One of the most stable data point you will always emit over the internet today is your IP address. Whether you are sending or receiving data, the TCP/IP stack must know what IP addresses are involved in a connection. This design decision would be less problematic if it wasn't for the fact that ISPs - at least those used historically by this author - offer only 'sticky' IP leases which cannot be easily relinquished or cycled through. Even if you could relinquish these leases, having multiple devices emit this identifier (i.e. mobile phone when you're connected to your home wifi) means it can be used as a possible identifier. The average internet user cannot obfuscate this identifier easily without relying or resorting to third party services. As a result, analytics companies are able to use this relatively stable data point to tie together your identity with other data points they may possess. For example, a user coming IP address 4.2.4.2 using Chrome browser on Windows with certain settings/addons is likely to be the same user who also had those same data points as before. This allows website operators to do some neat things like not require a SMS OTP to login again. However, if your goal is to reduce the probability of a service/website knowing who you are, this can become problematic. So what are you to do in 2022? Well, you can surf on the pseudoanonymous web.

The pseudoanonymous web allows users to obfuscate their identity by commingling their internet traffic together with other users or by obscuring the source/destination of their network traffic. There are a number of different methods to accomplish this today (e.g. VPN, I2P, and TOR), each with their own pros/cons, but the net effect on the user is some semblance of an anonymous existence on the modern web.

Leveraging the pseudoanonymous web does result in some other effects which users may not be immediately aware of. The largest of effect being that you will be lumped together with the lowest/worst common denominator definition for your network adjacent users. For example, if another user on the same IP address is scraping websites or sending out email spam, your traffic will be treated as if you were the perpetrator. This may mean that you may be able to read a website, but won't be able to contribute to it (e.g. Wikipedia). It may mean you will be trapped in an endless cycle of attempts to prove that you're a human (e.g. Google's reCAPTCHA). It may also mean that IP reputation vendors may flag your IP address/account with additional metadata like "open proxy", "low trust", or other flags which will prevent you from using DRM heavy services like video streaming services or complete online purchases. As a pseudoanonymous web user, your experience of the web becomes that of an edge case to website operators. You will be relegated to a code path which is either not well tested or defaults to "deny". As a blue teamer, I can understand and sympathize why website operators are not incentivized to optimize for this user experience path, but it can be a point of frustration for those who want to gain some degree of anonymity.

All is not lost though. The chief benefit of the pseudoanonymous web is that it does grant some degree of default anonymity. The ability of the average of the website administrator or network operator to be able to positively identify you, when combined with other data emission reduction/elimination techniques, is as near to zero as you can practically hope for. Without significant engineering effort on their part, or without nation state level capabilities, I am not aware of viable or scalable techniques that would allow cross device activity correlation without using additional correlative data points. For example, correlating behavior of my desktop network activity and my mobile device activity would be very difficult without another data point like account/device ID, email address, or something else. The network may be hostile, but that doesn't mean I can't take steps to reduce its hostility towards me or others. I'd encourage you use pseudoanonymous web and let me know how it goes.