Project Overview

A machine learning system that predicts individual medical costs based on demographic and lifestyle factors. Using Kaggle's insurance dataset, the model identifies key cost drivers and provides interpretable insights for healthcare planning.

Data Pipeline

Technical Implementation

  • Feature engineering with BMI-age interactions
  • Categorical encoding for smoking status and regions
  • Multiple linear regression model
  • Comprehensive model interpretation

Prediction Sequence

Prediction Sequence Diagram

Core Features

Factor Analysis

Identifies key cost drivers including smoking status, BMI, and age with coefficient analysis

Smart Features

Engineered interaction terms (BMI×Age) and regional cost analysis

High Accuracy

Achieves R² score of 0.866 on test data with $4,567 RMSE

Prediction API

REST API endpoint for cost predictions with demographic inputs

Technical Deep Dive

Modeling Pipeline

  1. Data cleaning & missing value imputation
  2. Categorical feature encoding
  3. Interaction term creation
  4. Train-test split (80-20)
  5. Model training & interpretation

Performance Metrics

Test R² Score 0.866
RMSE $4,567

Challenges & Solutions

⚠️ Data Quality

Implemented thorough EDA and outlier analysis to ensure data integrity for modeling

⚡ Model Interpretability

Used coefficient analysis and correlation matrices to explain feature importance

Next Project

Blood Group Detection
Image Processing (CNN) arrow

Blood Group Detection

Using Deep Learning (CNN) to Predict Blood Groups from Fingerprint Images