Using Data to Answer Hypothesis - Step by Step

In order to solve a problem you must first state your hypothesis.

Once you determine the questions to be answered, you must ask yourself some basic questions.

Who, What, Where, When, How and maybe even Why.

So if you are tasked to find out your customer base for example, here's how I'd approach it.
  • Who: Who is your Customer?
  • What: What Products did they purchase?  How many units?
  • Where: Where do the customer's reside?  (Region, Territory, Country, State, City)
  • When: When did they purchase products?  How long interval between first touch and closed sale?
  • How: How did the customer find our product?  (Campaign, Google Search Word, Advertisements)
Those questions would help to answer the "Dimensions" part of the equations, so you can slice and dice.

The "Measures" or numeric values would be:
  • Sum(Purchase Amount)
  • Average(Purchase Amount)
  • Count(Customers) 
Next step would be to identify the Data Sources, where does my data reside.  What fields would help to answer my questions?  What information is just noise and can be disregarded?

Next step would be to ETL, Extract, Transform and Load your data, assuming your data has already been cleansed and data governed.

Next step would be to create a workable Query or Queries to pull the data into Dimensions and Measures, with the final resting home to be a Cube for slicing and dicing.

That's how I would approach it anyway, as my expertise is from the Business Intelligence side.

If you were a Data Scientist, I think you would follow the same steps, except you may be pulling from Big Data or UnStructured Data.

And you may be applying Models for Statistical or Mathematical predictions and calculations.

In the end you should either have answered your question / hypothesis, or discover you need to modify your approach, find more data, revise your models, etc.

And document along the way, your assumptions and findings, so you can later reproduce the experiment.

That's the way I see it anyway.