Data Wrangling is Career Strangling

By Kerry Hew posted 08-16-2018 05:12 PM

  
I wrote an article on LinkedIn and thought to publish here as well.
Here is the link, in case you choose to engage there: https://www.linkedin.com/pulse/data-wrangling-career-strangling-kerry-hew/

0?e=1539820800&v=beta&t=4LPjckI1ilb--piBQvjSnjouWW8qsX2AUddfcikSDWM

Data Wrangling is Career Strangling


Data wrangling is a necessary process when working with big data; most data, in reality. This opinion piece is not to diminish its importance. Nor, is this to be confused with Data Engineering. But I will argue that data wrangling is career strangling, in that it is holding you back in your career progression. Let me explain...

Firstly, let's agree that the whole basis of big data is to whittle it down to little data, that we call "Insights". The point of any data analysis is to identify a trend or anomaly. The point of a machine learning model is to find a set of defined patterns or assign a probability.

Observe any Data Scientist or Analyst presentation and the only pieces that get talked about are the Insights and the model. Zero time is spent explaining how the data was wrangled, despite that being 60-80% of the effort.

I am making the argument that data wrangling is low-level, tedious work that is wasted when an expensive resource such as a Data Scientist or Data Engineer or Analyst decides to take this on.

The best consultants know that:

You don't get paid for the hour. You get paid for the value you bring to the hour

The more time you spend on lower value work, the more you diminish your value.

And if you're an Analyst / Data Scientist spending a greater portion of your time wrangling data, that's much less time that you're spending to understand the data, that's much less time you're spending to analyze the data, that's much less time to you're spending on delivering business value from the data.

When it comes to big data, I believe that folks are starting to realize that robust software engineering practices need to be put in place to ensure quality of the data pipeline and #datagovernance. ...Cue the Data Engineer.

In today's episode (Aug 14) of the Digital Analytics Power Hour (a wonderful podcast, btw), there was a great discussion about raw data and data virtualization. I didn't feel that there was any consensus, so I'll throw in my 2 cents.

A company must adopt a tool or process to virtualize the raw data for the Data Scientists and Analysts. Drawing from software principles, the solution — built in-house or purchased — must be robust, scalable, extendable, and re-usable.

This will save an immense amount of time (and headache).

For example, when working with raw clickstream data, you have billions of atomic events. In most cases, identity resolution is required over a specified period of time. If every Data Scientist or Analyst is starting with the raw data, I guarantee that each will resolve the identity in a different manner (different "code"). This leads to multiple, inconsistent "truths". The Analysts / Data Scientists should only work from a consistent, consolidated schema for the vast majority of cases.

So, when I say "Data wrangling is career strangling", it's because you're devoting too much time to work with a lower-assigned value.

[Tangential annecdote: I use Salesforce a lot in my work. If I'm to be diligent, the data entry could be up to 4 hrs a week. I hired a VA  on my own dime  to handle this. This allows me to spend more time on higher value (and quite frankly, more fun) tasks. I value my time]

In the end, businesses are results-oriented. If you can produce more positive business results in a shorter time frame, then your career trajectory will move up-and-to-the-right at an accelerated pace.

And it's a compounding factor. Those that produce results are provided more opportunities. The sooner you produce results, the sooner those opportunities present themselves.

Focus on value delivered.

The faster you iterate, the faster you grow.


#CareerAdvice

Permalink

Most Recent Blogs

  • Posted in: Member Open Forum

    Because DAA is data driven and member focused, the recent survey from DAA Education team provided some keen insights into ...

    1 person likes this.
  • It’s been a busy and fruitful year for DAA – with a growing membership and new benefits to help you and your team learn, ...

    1 person likes this.
  • Ask any Data Scientist and they will tell you that the process of 'wrangling' (loading, understanding and preparing) data ...

    1 person likes this.
  • Posted in: Women in Analytics

    Back from TED Women last week. I have talked to each of you about equality issues in some shape or form and thought you ...

    1 person likes this.
  • Our November Board of Directors (BoD) meeting is officially in the books! Thought I’d appeal to the analyst in you and ...

    1 person likes this.
  • Posted in: Member Open Forum

    Good evening all, It has been a few months since I last posted here. I am based in London where I have spent most of my ...