The Biggest Promise of Digital Data Is Going Away (and That's Okay)

By Tim Wilson posted 03-27-2020 12:20 PM

Like

When digital analytics (née web analytics) first emerged, one of the great promises of the analog-to-digital shift was that organizations could start "tracking everything." This started on websites: "We can see every user and every page they viewed and when and in what order!" That quickly migrated to digital media—all the way to the point of tying the two together: "We can know that a user viewed the product details page for our Super Duper Widget and did not purchase it, so we can now show them ads for the Super Duper Widget on other sites! And we can measure whether that drove them to come back to the site and to purchase!"

A fundamental allure of digital data was the very nature of "all." We had "all" the data. As a result, many (most?) analysts who entered the field could file their knowledge of statistics into their permanent archive because the core purpose of statistics is to use samples from a population to draw inferences about the population as a whole. We had the whole population (even if the population was just "all the visitors to our website").

Fast forward to the past few years, and consumers have (justifiably) gotten uncomfortable with this level of data collection, including the subsequent use of that data. That discomfort has led to two systemic changes that directly impact the digital "track everything everywhere" environment:

Regulatory Updates—GDPR is the regulation that has gotten the most attention, but PIPEDA in Canada was first enacted in 2000 (and has been amended multiple times), and CCPA went into effect in the U.S. this year.
Browser Updates—Google Chrome rolled out fairly toothless "Do Not Track" functionality in 2012, and the industry collectively shrugged. In the last year alone, though, every major browser has updated the way they handle cookies by default, to the point that Simo Ahava established https://www.cookiestatus.com/ as an open-source initiative to centralize the tracking of the various browsers and their updates.

Both of these movements are disruptively rippling through the digital analytics and optimization (and digital media) industry. While our data collection was never as perfect and complete as it was often treated, these updates are hampering tracking in a way that can no longer simply be ignored.

There are two fundamentally different responses to these shifts.

The Natural Response

Most organizations appear to be responding to these changes by asking the question: "How can we still collect as much data as possible while being compliant with regulations and finding technical workarounds to browser changes?" This is understandable. As analysts, we're all part engineers, which means we look for clever technical solutions to "problems," and, we're natural pleasers, so we want to tell our stakeholders that we've addressed these changes for them! "We've found a good needle to thread on GDPR so we can still capture, store, and use most of what we historically captured. And, we've introduced [device fingerprinting, server-side cookies, local storage, CNAME updates,...] so that our tracking will continue to work as it used to for most browsers!"

While this response is not "wrong," the framing is short-sighted, as it sets up these changes as a "problem that needs to be solved" rather than "an opportunity to work differently while respecting users' wants and needs." Both from a regulatory and a technical standpoint, it sets up the analyst to perpetually seek out new loopholes as regulations and browsers evolve to shut off the latest workarounds and exploits that organizations have identified.

A Better (But More Challenging) Response

A fundamentally different way to approach these challenges is through the eyes of the user (which is what many would claim this whole "customer-centricity" movement is about). Stéphane Hamel recently explored a thought experiment where he asked: "What if the guideline was: no (explicit) consent means no tracking. No exceptions." While this is a simplistic starting point, and it focuses on the data collection consent (and not the data usage consent), it's useful! More broadly, his thought experiment was around thinking about the ethics of tracking rather than starting with regulatory or technical constraints.

What would an approach like this mean in practical terms? It would mean that, almost certainly, much of the traffic to an organization's digital properties would simply not be tracked (or, at least, would be immediately discarded—technical requirements for actually making a website work sneak in awfully quickly). Analysts and optimizers would be working with only a sample of their actual traffic. That would mean that reporting "volume" metrics like visits and page views would either go away or be more heavily caveated as an estimate. But, ratio metrics—conversion rate (macro or micro), average order value, page views per visit, etc.—do not need 100% of the traffic to be effectively estimated. A sample can be fine! True, there will likely be sample bias (users who consent to tracking likely differ from users who do not), but working with samples, including minimizing the impact of sample bias, is the domain of statistics (and, more broadly, data science). That may be a scary future to ponder, as it means dusting off some old textbooks or signing up for some new coursework, but it seems like an inevitable future, and, frankly, a happier one when it comes to returning trust to consumers.
#GDPR
#Tracking
#Tracking
#PIPEDA
#CCPA

Permalink

Comments

Sean McClain

04-16-2020 10:12 AM

Tim!

This is a thought-provoking article that, as a young, less experienced analyst, gives me a great baseline for understanding the shifting landscape of our industry. I have pondered what the increasing regulations that are being imposed on our industry mean for the future of my career and this article gives me a much more complete picture to work with.

Great Read!

Declan Owens

04-09-2020 04:54 AM

Hello Tim, thanks for this blog post, it really summarises most of what was shared at SuperWeek this year.

Nonetheless, though I do agree with fighting against the philosophy of always searching for new loopholes and thus striving towards sustainable solutions, I do not believe that "No consent, no tracking", in the way it is suggested here (sending no events/hits) is the only way to go, nor necessarily the right way to go (sustainably).

I believe in "No consent, no tracking" but when the definition of tracking is actually about 'following someone'. In digital analytics, that would be linking events with each other, with either a visitor or a visit ID for example. Thus, if an analyst wanted to just count the number of events (page views, clicks, etc.), they would be able to, because they are not related to anyone, there is no tracking. The analyst gets some data to work with, the user's privacy is fully protected.

If we turn to regulations first, how and when to request consent is not always clear, especially in the EU where ePrivacy is still expected. Under the GDPR - one of the strictest regulations - when rendered irreversibly anonymous, the identifier is no longer considered as 'Personal Data'. Lets note though that it is most probably impossible to achieve that in digital analytics, as to make it irreversible you'd have to delete the encryption key after firing the first event.

If we look at ethics, only moral obligations control individuals here, and this conception is somewhat subjective depending one's cultural influences, for example. So the only other way to have guarantees is, as you well mentioned, trust. But if there is no relationship between users and analysts, how can we call that 'trust'? Users could progressively renew their consent in time, but would one always want to 'trust' another with all their data about them? No, that's not about trust, it is just privacy.

Furthermore, on the analyst's side, can we really 'trust' them to stay put with no data? Because in the end, all computer systems have logs, and those logs can still be used for analytics #TimeMachine. This is dangerous, because these logs contain very sensitive information, such as IP addresses and User Agents, that are perfect fingerprinting material.

To conclude, I believe that blocking the collection of events will have similar effects to prohibition: the ban will just generate a black market that will spin out of control. Just collecting unrelated events (that stay that way) is safe for users' privacy. But what are really needed are trustworthy Data Processors, that keep analysts out of pure logs and guarantee users' privacy. Such actors already exist on the market, notably in Digital Analytics, but most people have Google Analytics in the back of their head, and given their numerous data-based activities and hold on users' digital lives with data, that 'trust' we hope for just seems like a utopia to be honest.

Sorry, I really felt like I needed to say this :). I hope you understand my opinion here. I'm open, so feel free to reply here or continue the debate elsewhere, I really do have great interest in this topic!

Have a great day!

Blogs