Prepare for the AWS Certified AI Practitioner Exam with flashcards and multiple choice questions. Each question includes hints and explanations to help you succeed on your test. Get ready for certification!

Practice this question and more.


In which stage of the ML pipeline is the company when they create a correlation matrix and analyze collected data?

  1. Data pre-processing

  2. Feature engineering

  3. Exploratory data analysis

  4. Hyperparameter tuning

The correct answer is: Exploratory data analysis

The process of creating a correlation matrix and analyzing collected data falls under the stage of exploratory data analysis. This stage is crucial for understanding the underlying patterns and relationships within the dataset before proceeding to model building. During exploratory data analysis, data scientists usually employ techniques to summarize the main characteristics of the data, often employing visual methods. By analyzing correlations between variables, they can identify how different attributes interact with one another, which can inform decisions about potential feature engineering later in the pipeline. In contrast, data pre-processing focuses on preparing the data for analysis by addressing issues such as missing values, normalization, and encoding categorical variables. Feature engineering involves transforming raw data into meaningful features that improve model performance, which follows after exploratory data analysis reveals significant patterns. Hyperparameter tuning is a more advanced stage that adjusts various settings in the chosen algorithms to optimize the model’s performance after it has been built. Understanding that exploratory data analysis is primarily about visualizing and uncovering insights from the data is key to recognizing why it is the correct answer.