Edge AI Optimization
The Challenge
Running complex deep learning models for eye-tracking on consumer-grade edge devices (like laptops with basic webcams) was resulting in high latency. The standard models were too heavy, causing frame drops and a poor user experience in real-time applications.
The Solution
I led the optimization effort by porting PyTorch models to Intel OpenVINO. By applying aggressive **model quantization (FP32 to INT8)** and **pruning techniques**, I significantly reduced the model size without compromising accuracy. The inference engine was rewritten in highly optimized python for maximum performance on CPU.
- Converted models to OpenVINO Intermediate Representation (IR).
- Implemented post-training quantization to reduce memory footprint.
- Profiled and optimized the python inference pipeline to remove bottlenecks.
Key Outcomes
4x
Faster Inference Speed
3.5x
Model Size Reduction