Introduction

We can look at everything around us as data today. The question is what do we do with it. Data is very useful for understanding and predicting future trends. Today I view data analysis skill as one of the major soft skill everyone should learn.

The Work Flow of Data Analysis

Clean up Data

Visualize Data

  1. Numerical Data vs. Numerical Data
  1. Numerical Data vs. Categorical Data
  1. Other possible plots

Summarize Data

  1. Null Hypothesis H0 and Alternative Hypothesis H1 is stated
  2. Methods are chosen to test for hypothesis
  1. Confidence Interval Testing
  2. p-value testing
  3. Critical Region Testing

Train the Data for Future Purposes

  1. Split the data into train and test data
  2. Train the model with train data
  3. Test the trained model with test data
  4. Evaluate the model to see whether it is ready to use
  1. Bootstrap Sampling
  2. Leave One Out Cross Validation
  3. K-Fold Cross Validation
  4. Random Forest

Work Examples

Some of the work examples of some (or all analysis processes) can be viewed here

References