Getting your hands dirty and doing some actual work!

General Curricula

There are a bunch of free courses available online that walk through many topics (including ones not listed below):

Data preparation (wrangling and munging)

We spend 80% of our efforts on data preparation, not on analysis and execution

Data wrangling is incredibly important. Sure, you can throw everything into a model and hope for the best, but the right preprocessing steps can actually make the difference between success and failure.

People

You, probably. If there are Analytics Engineers nearby, this is what they do all day.

Process

The general flow of data munging:

  1. Discover

    Get familiar with the data! This is the initial exploration to establish an understanding of important patterns and identify major structural issues (what’s in the data vs what “should” be in the data).

    This is a great time to start that data dictionary on the inputs if it doesn’t exist already.

  2. Structure

  3. Clean

  4. Enrich

    Attach other data sources that will be needed for consumption or modeling later. For example, geo data may be encoded as zip codes, but you want to know locations and distances later, so you could add latitude and longitude.