Förderjahr 2024 / Stipendium Call #19 / ProjektID: 7170 / Projekt: Neural network splitting for energy-efficient Edge-AI
How can we run advanced AI models efficiently on resource-limited edge devices? Discover DynaSplit, a framework designed to optimize energy and latency by dynamically splitting neural networks and fine-tuning hardware for edge-cloud applications.
Introduction
Artificial intelligence (AI) is transforming industries, but deploying advanced models on edge devices - such as IoT sensors or smartphones - remains a challenge. These devices often lack the computational power and energy resources required for AI inference. To address this, my project introduces DynaSplit, a framework designed to optimize the energy efficiency and latency of AI inference through innovative system design.
Edge devices and cloud computing represent two ends of a continuum. While edge devices are limited but offer low-latency processing, the cloud provides scalability at the cost of energy and latency. DynaSplit bridges this gap by dynamically splitting neural networks between edge and cloud, ensuring efficient resource utilization without sacrificing performance.
The Problem: Bridging Edge and Cloud
Modern deep neural networks (DNNs) are powerful but resource-intensive, posing a dilemma for edge devices. On one hand, running the entire model on the edge ensures low latency but quickly exhausts computational resources. On the other hand, relying solely on the cloud can introduce unacceptable delays due to data transmission.
To make AI accessible in resource-constrained environments, we need a system that dynamically distributes the workload, leveraging the strengths of both edge and cloud computing. Enter DynaSplit.
The DynaSplit Framework: A Two-Phase Approach
DynaSplit is designed to tackle the challenges of energy efficiency, latency, and performance simultaneously. It achieves this through a two-phase methodology:
Offline Phase: Optimizing for Performance
During this phase, DynaSplit explores the possible configurations of neural network split points and hardware parameters. Using a multi-objective optimization algorithm, it identifies Pareto-optimal solutions - configurations that achieve the best trade-offs between energy efficiency and latency.
Key tasks include:
- Profiling the energy and latency impact of different split points.
- Tuning hardware parameters like CPU frequency and GPU utilization.
- Creating a lookup table of optimal configurations for various scenarios.
Online Phase: Dynamic Configuration
When an inference request is received, DynaSplit's scheduling algorithm selects the most suitable configuration from the lookup table. This ensures that:
- Latency requirements (Quality of Service, or QoS) are met.
- Energy consumption is minimized for the given workload and device constraints.
This dual-phase approach enables DynaSplit to adapt to dynamic workloads and varying device capabilities.
Why It Matters
By combining split computing with hardware parameter optimization, DynaSplit addresses a critical gap in edge AI deployment. It empowers resource-constrained devices to handle advanced AI models efficiently, enabling applications like:
- Real-time healthcare monitoring.
- Autonomous vehicles.
- Smart city infrastructure.
Current Progress and Next Steps
The system design and implementation of DynaSplit have been successfully completed. The framework is now ready for extensive testing to demonstrate its capabilities. Upcoming updates will share insights into how DynaSplit performs in real-world scenarios, highlighting its impact on energy efficiency and latency optimization.