Steps in a Data Analysis Project

Steps in a Data Analysis Project

The term “big data” refers to structured or unstructured data that is significant, fast, or complex, thus strenuous or even impossible to process using traditional methods. The incorporation of big data has prompt implications for building a machine learning model and financial forecasting.

Structured Data

The steps for building a machine learning model using structured (traditional) big data are as follows:

Step 1: Conceptualization of the modeling task.

This step requires establishing the model output, how and who will use the model, and how the output will be planted in existing or new business.

Step 2: Data collection.

Structured data can be obtained from both internal and external sources.

Step 3: Data preparation and wrangling.

Cleansing and preprocessing of raw data takes place in this step. Cleansing involves resolving missing values or outliers, whereas preprocessing entails extracting, accumulating, filtering, and selecting relevant data columns.

Step 4: Data exploration.

The data exploration step involves exploratory data analysis, selecting, and engineering features.

Step 5: Model training.

The suitable machine learning approach is selected in this step. Moreover, the performance of the trained model is evaluated, and the model is tuned accordingly.

These steps are repetitive since model building is an iterative process.

Unstructured Data

Building text ML models requires different steps since they incorporate unstructured data sources of big data. The differences in steps between the text model and traditional model account for the characteristics of big data: volume, velocity, variety, and veracity. Here, we focus on the variety and veracity attributes of big data as they are apparent in the text.

Text ML model building steps are as follows:

Step 1: Text problem formulation.

This step entails the formulate text classification problem and identifying the exact inputs and outputs for the model. Further, it is crucial to establish how the text ML model’s classification output will be put to use.

Step 2: Text curation.

Relevant text data is obtained from web pages through scraping or crawling programs.

Step 3: Text preparation and wrangling.

This step entails cleansing and preprocessing tasks required for converting unstructured data into a format that is usable by traditional modeling methods designed for structured inputs.

Step 4: Text exploration.

Here, text is visualized through methods such as word clouds. Additionally, it involves text feature selection and engineering

Finally, the output can either be combined with other structured variables or used directly for forecasting and analysis.

The following figure demonstrates the above steps of building a model for financial forecasting using structured (traditional) vs. unstructured (text) big data.

Building a Model for Financial ForecastingQuestion

Consider the following statements:

Statement 1: “The second step in building text-based machine learning models is text preparation and wrangling. However, the second step in building machine learning models using structured data is data collection.”

Statement 2: “The fourth step in building both types of models encompasses data/text exploration.”

Which of the statements relating to the steps in building structured data-based and text-based machine learning models is (are) accurate?

      A. Only Statement 1.

      B. Only Statement 2.

      C. Both Statement 1 and Statement 2.


The correct answer is B.

The five steps in building structured data-based machine learning models are:

  1. Conceptualization of the modeling task
  2. Data collection
  3. Data preparation and wrangling
  4. Data exploration
  5. Model training.

The five steps in building text-based ML models are:

  1. Text problem formulation
  2. Data (text) curation
  3. Text preparation and wrangling
  4. Text exploration
  5. Model training.

Statement 2 is correct since the fourth step in building both types of models involves data/text exploration.

A and C are incorrect. Text preparation and wrangling is the third step in building text ML models and occurs after the second data (text) curation step.

Reading 7: Big Data Projects

LOS 7 (a) Identify and explain steps in a data analysis project

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success
    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep

    Daniel Glyn
    Daniel Glyn
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
    Nyka Smith
    Nyka Smith
    Every concept is very well explained by Nilay Arun. kudos to you man!
    Badr Moubile
    Badr Moubile
    Very helpfull!
    Agustin Olcese
    Agustin Olcese
    Excellent explantions, very clear!
    Jaak Jay
    Jaak Jay
    Awesome content, kudos to Prof.James Frojan
    sindhushree reddy
    sindhushree reddy
    Crisp and short ppt of Frm chapters and great explanation with examples.