Introduction
We can look at everything around us as data today. The question is what do we do with it. Data is very useful for understanding and predicting future trends. Today I view data analysis skill as one of the major soft skill everyone should learn.
The Work Flow of Data Analysis
- Data Analysis is about taking the data, visualize, summarize it and present it to target audience. The complete cycle of data analysis consist of four steps
Clean up Data
- Identify all the missing values
- Identify relevant columns
Visualize Data
- Plot data to identify any potential patterns. When visualize the data, some of the very common visualizations include
- Numerical Data vs. Numerical Data
- Scatter Plot
- Numerical Data vs. Categorical Data
- Bar Chart
- Box Plot
- Other possible plots
- Line Chart
- Heat Map
Summarize Data
- Verify patterns identified in Data Visualization section. The main method used to verified is called hypothesis testing. In hypothesis testing
- Null Hypothesis H0 and Alternative Hypothesis H1 is stated
- Methods are chosen to test for hypothesis
- Methods of hypothesis testing include
- Confidence Interval Testing
- p-value testing
- Critical Region Testing
- Further more Linear Regression can be applied to predict future values
Train the Data for Future Purposes
- Here, the applications of Machine Learning is applied to train the model. Models are use to make any future predictions. The workflow is as follows
- Split the data into train and test data
- Train the model with train data
- Test the trained model with test data
- Evaluate the model to see whether it is ready to use
- Common methods of training the model include
- Bootstrap Sampling
- Leave One Out Cross Validation
- K-Fold Cross Validation
- Random Forest
Work Examples
Some of the work examples of some (or all analysis processes) can be viewed here
References
- Data Rockie School(2025), Data Science Bootcamp 12 https://data-science-bootcamp1.teachable.com/p/data-science-bootcamp-12
Comments