Project Overview

A machine learning pipeline that predicts sales for 1,559 products across 10 stores. The system identifies key drivers of retail performance and provides actionable insights for inventory optimization using advanced feature engineering and gradient boosting.

Data Analysis

Technical Implementation

  • Automated missing value treatment (mean/mode imputation)
  • IQR-based outlier detection and capping
  • Label encoding for categorical features
  • XGBoost regression with hyperparameter tuning

Key Features

Data Cleaning

Handles 17% missing data in item weights and 28% missing store size values using intelligent imputation strategies

Feature Engineering

Transforms 8 categorical variables into numerical features using label encoding

Model Performance

Achieves R² score of 0.67 on training data and 0.57 on test data

Insight Generation

Identifies key factors impacting sales through feature importance analysis

Technical Deep Dive

Data Pipeline

  1. Exploratory data analysis
  2. Outlier treatment using IQR
  3. Categorical feature encoding
  4. Train-test split (80:20)
  5. Model training & evaluation

Performance Metrics

Training R² Score 0.67
Testing R² Score 0.57

Challenges & Solutions

⚠️ Missing Data

Implemented hybrid imputation strategy using mean for numerical features and mode-based imputation for categorical features

⚡ Model Accuracy

Improved performance through feature engineering and hyperparameter tuning of XGBoost regressor

Next Project

Air Quality Forecasting
Time Series Analysis arrow

Air Quality Forecasting

Utilizing Machine Learning to Predict and Monitor Air Quality Levels