System Overview

A real-time computer vision system that interprets sign language gestures through webcam input, converting them into text and audio output. Leveraging MediaPipe's hand tracking and custom gesture recognition algorithms, the system achieves 95% accuracy in controlled environments.

System Architecture

Technical Implementation

  • MediaPipe Hands pipeline for 21-point hand landmark detection
  • Custom angle-based gesture classification algorithm
  • Flask backend with WebSocket for real-time video streaming
  • History tracking with export functionality

Core Features

Real-time Processing

Processes webcam input at 30 FPS with MediaPipe hand tracking, delivering instant gesture recognition with under 50ms latency

Gesture Library

Recognizes 15+ essential ASL gestures including letters, numbers, and common symbols like "OK" and "Peace"

Session History

Maintains timestamped detection log with export options (TXT/CSV) for review and analysis

Web Accessible

Responsive web interface works on all devices with modern browsers, deployed on AWS EC2 for global access

Technical Deep Dive

Gesture Recognition Pipeline

  1. Hand landmark detection using MediaPipe
  2. Angle calculation between key joints
  3. Finger state classification (extended/folded)
  4. Geometric pattern matching
  5. Temporal smoothing of results

Performance Metrics

Detection Accuracy 95%
Processing Speed 28 FPS

Challenges & Solutions

⚠️ Lighting Variations

Implemented adaptive brightness normalization and histogram equalization to handle different lighting conditions

⚡Real-time Performance

Optimized MediaPipe configuration and implemented frame skipping to maintain 30 FPS on low-end devices

Next Project

Secure Password Generator
Web Security arrow

Secure Password Generator

Generate and analyze strength of passwords with real-time feedback