When digital analytics (née web analytics) first emerged, one of the great promises of the analog-to-digital shift was that organizations could start "tracking everything." This started on websites: "We can see every user and every page they viewed and when and in what order!" That quickly migrated to digital media—all the way to the point of tying the two together: "We can know that a user viewed the product details page for our Super Duper Widget and did not purchase it, so we can now show them ads for the Super Duper Widget on other sites! And we can measure whether that drove them to come back to the site and to purchase!"
A fundamental allure of digital data was the very nature of "all." We had "all" the data. As a result, many (most?) analysts who entered the field could file their knowledge of statistics into their permanent archive because the core purpose of statistics is to use samples from a population to draw inferences about the population as a whole. We had the whole population (even if the population was just "all the visitors to our website").
Fast forward to the past few years, and consumers have (justifiably) gotten uncomfortable with this level of data collection, including the subsequent use of that data. That discomfort has led to two systemic changes that directly impact the digital "track everything everywhere" environment:
- Regulatory Updates—GDPR is the regulation that has gotten the most attention, but PIPEDA in Canada was first enacted in 2000 (and has been amended multiple times), and CCPA went into effect in the U.S. this year.
- Browser Updates—Google Chrome rolled out fairly toothless "Do Not Track" functionality in 2012, and the industry collectively shrugged. In the last year alone, though, every major browser has updated the way they handle cookies by default, to the point that Simo Ahava established https://www.cookiestatus.com/ as an open-source initiative to centralize the tracking of the various browsers and their updates.
Both of these movements are disruptively rippling through the digital analytics and optimization (and digital media) industry. While our data collection was never as perfect and complete as it was often treated, these updates are hampering tracking in a way that can no longer simply be ignored.
There are two fundamentally different responses to these shifts.The Natural Response
Most organizations appear to be responding to these changes by asking the question: "How can we still collect as much data as possible while being compliant with regulations and finding technical workarounds to browser changes?" This is understandable. As analysts, we're all part engineers, which means we look for clever technical solutions to "problems," and, we're natural pleasers, so we want to tell our stakeholders that we've addressed these changes for them! "We've found a good needle to thread on GDPR so we can still capture, store, and use most of what we historically captured. And, we've introduced [device fingerprinting, server-side cookies, local storage, CNAME updates,...] so that our tracking will continue to work as it used to for most browsers!"
While this response is not "wrong," the framing is short-sighted, as it sets up these changes as a "problem that needs to be solved" rather than "an opportunity to work differently while respecting users' wants and needs." Both from a regulatory and a technical standpoint, it sets up the analyst to perpetually seek out new loopholes as regulations and browsers evolve to shut off the latest workarounds and exploits that organizations have identified.A Better (But More Challenging) Response
A fundamentally different way to approach these challenges is through the eyes of the user (which is what many would claim this whole "customer-centricity" movement is about). Stéphane Hamel recently explored a thought experiment
where he asked: "What if the guideline was: no (explicit) consent means no tracking. No exceptions." While this is a simplistic starting point, and it focuses on the data collection consent (and not the data usage consent), it's useful! More broadly, his thought experiment was around thinking about the ethics of tracking rather than starting with regulatory or technical constraints.
What would an approach like this mean in practical terms? It would mean that, almost certainly, much of the traffic to an organization's digital properties would simply not be tracked (or, at least, would be immediately discarded—technical requirements for actually making a website work sneak in awfully quickly). Analysts and optimizers would be working with only a sample of their actual traffic. That would mean that reporting "volume" metrics like visits and page views would either go away or be more heavily caveated as an estimate. But, ratio metrics—conversion rate (macro or micro), average order value, page views per visit, etc.—do not need 100% of the traffic to be effectively estimated. A sample can be fine! True, there will likely be sample bias (users who consent to tracking likely differ from users who do not), but working with samples, including minimizing the impact of sample bias, is the domain of statistics (and, more broadly, data science). That may be a scary future to ponder, as it means dusting off some old textbooks or signing up for some new coursework, but it seems like an inevitable future, and, frankly, a happier one when it comes to returning trust to consumers.#GDPR