Student Performance Prediction System

System Architecture

An end-to-end machine learning pipeline that processes student data through automated feature engineering and model selection, delivering predictions via a Flask web interface. The system achieves 88% R² score using optimized ensemble models.

Technical Implementation

▹ Automated data preprocessing (OneHotEncoding, StandardScaler)
▹ Hyperparameter tuning with GridSearchCV
▹ Ensemble model comparison (7 algorithms)
▹ Model persistence with dill

Core Features

Automated Model Selection

Evaluates 7 regression models (XGBoost, CatBoost, Random Forest) and selects best performer based on R² score

Data Pipeline

Automated data validation and transformation with Scikit-learn ColumnTransformer

Web Interface

Responsive Flask interface with form validation and score clamping (0-100)

Model Interpretability

SHAP value integration for feature importance visualization

Technical Specifications

ML Pipeline

Data ingestion from CSV/API
Automated feature engineering
Hyperparameter optimization
Model evaluation (R², MAE, MSE)
Prediction serving via Flask

Performance Metrics

R² Score 0.88

MAE 4.2

Development Challenges

⚠️ Categorical Feature Handling

Implemented robust OneHotEncoding with rare category handling and automated pipeline persistence

⚡ Model Deployment

Optimized Flask app for production with gunicorn and proper error handling