Edge AI Optimization

OpenVINO python Computer Vision Quantization Edge Devices

The Challenge

Running complex deep learning models for eye-tracking on consumer-grade edge devices (like laptops with basic webcams) was resulting in high latency. The standard models were too heavy, causing frame drops and a poor user experience in real-time applications.

The Solution

I led the optimization effort by porting PyTorch models to Intel OpenVINO. By applying aggressive **model quantization (FP32 to INT8)** and **pruning techniques**, I significantly reduced the model size without compromising accuracy. The inference engine was rewritten in highly optimized python for maximum performance on CPU.

  • Converted models to OpenVINO Intermediate Representation (IR).
  • Implemented post-training quantization to reduce memory footprint.
  • Profiled and optimized the python inference pipeline to remove bottlenecks.

Key Outcomes

4x

Faster Inference Speed

3.5x

Model Size Reduction

Project Info

  • Role Master Thesis Student / Engineer
  • Timeline Jan 2021 - Present
  • Area Computer Vision & Edge AI
  • Core Tech python, OpenVINO, PyTorch