Förderjahr 2021 / Stipendien Call #16 / ProjektID: 5884 / Projekt: Efficient and Transparent Model Selection for Serverless Machine Learning Platforms
In my previous blog post I presented the state-of-the-art in automated machine learning platforms. I went into one specific example from industry appliocation in more detail: Amazon SageMaker. Its offering is vast and even included an option for locally inferencing machine learning models. The offered open source container helped in shaping the first prototype of the serverless AutoML platform, which for various reasons was supplanted by different approaches which I will describe in this blog post.
Early Prototyping: Focus on the execution
To start off the prototyping phase, I chose to heavily build off my new knowledge from Amazon SageMaker, which already built off on my knowledge of AWS in general. I quickly noticed, that there is a local execution mode for Amazon SageMaker. It is meant for testing the various different tasks you can do remotely in SageMaker on your local machine beforehand. This goes in line with other AWS services which can also be tested locally by using third-party emulation software like LocalStack.
The way this local mode works is by providing a Docker container loaded with all the needed dependencies to execute the core tasks of SageMaker (training, processing, inference, etc.). For using this mode for inferencing you have two options: use the official SageMaker SDK or utilize LocalStack, which also uses the SageMaker container for local execution. As I am very familiar with LocalStack, I chose to adapt that code to serve my purpose of executing models inside my platform.
Getting this execution mode to behave in a way that I can simply input it into a bigger architecture was a bigger task than originally anticipated. I was constrained by the interface delivered to me from the container, which in itself was not really well documented. The container, while offering an open source license, did not have an extensive documentation since it was created mainly for use inside of Amazons datacenters and with the official SageMaker SDK. I noticed that looking at the problem on a micro scale (first getting the execution of models right), hindered me to go to a macro scale (how the different components should interact with each other).
Advanced Prototype: Shift towards the bigger picture
After I've noticed that focusing too much on the execution platform took a lot of time, I decided to have a bigger overview of the problem I'm trying to solve. I started by envisioning which components I may need, and how I can deploy them. In this exploration I came across a very useful framework: Ray.io.
Ray.io "is an open-source unified compute framework that makes it easy to scale AI and Python workloads — from reinforcement learning to deep learning to tuning, and model serving." I specifically had an interest in Ray Serve, the portion which allows setting up pipelined architectures with multiple components with one simple Python program.
In Ray Serve, you can just create Python classes which are marked as deployments with a decorator and have them depend on each other. Ray itself then handles all the scheduling in the background. You can deploy applications with their deployments on various different execution environments, e.g., locally on your PC for testing or on a big cluster for production use.
@serve.deployment
class Driver:
def __init__(self, model_a_handle, model_b_handle):
self._model_a_handle = model_a_handle
self._model_b_handle = model_b_handle
async def __call__(self, request):
ref_a = await self._model_a_handle.remote(request)
ref_b = await self._model_b_handle.remote(request)
return (await ref_a) + (await ref_b)
model_a = ModelA.bind()
model_b = ModelB.bind()
# model_a and model_b will be passed to the Driver constructor as ServeHandles.
driver = Driver.bind(model_a, model_b)
This was an ideal environment to get a prototypical architecture running, with the main focus being the realization of a model selection algorithm. After this first prototype of the entire platform was running, I decided that model selection in itself is the most important contribution for a serverless platform for automated machine learning and should be the central component of the architecture. In the end, I removed Ray.io from the prototype, as there is a need for quick evaluation of the platform and having an abstraction above the cluster software used for deployment would make this evaluation imprecise. For the final consolidation of the prototype which I will present in the next blog post, I removed the traces of Ray.io and defined the components as their own pods inside a Kubernetes cluster.
Take-aways: Design goals
After all, this exploration showed me the need for proper design goals (DG), which I have then developed for finding a final prototype architecture:
- DG1 Requesting a model should be reasonably easy to add to a variety of applications and not require extensive knowledge of the architecture of the platform or any specific non-standard technology.
- DG2 Hosting of machine learning models should use a industry-standard solution with an existing large set of available models.
- DG3 Developers of applications which utilize the platform should be able to request a model for their specific task with a minimal latency, while also maximizing the accuracy of the model. As those two aspects describe a trade-off developers need to define which of those aspects is more important to them.
- DG4 The platform should serve multiple developers and thus aim to maximize the performance for all clients. Therefore, the load sent to the platform should be balanced between the different available underlying execution nodes.
We also explicitly define non-goals (NG) of the prototype. These aspects can be added on in future work but are not part of the implementation and evaluation in my project:
- NG1 The platform maintains a repository of a large quantity of various machine learning models. Reason: Creating and maintaining a suitable set of machine learning models for proper use should be done by experienced professionals in this field. The platform itself should only provide the means of facilitating the execution of a large variety of models (see DG2).
- NG2 The platform schedules model executors automatically on suitable nodes in the execution cluster. Reason: The focus of this thesis lies in the experience of the application developer. We will evaluate different manually established schedules for the executors. Future work can explore the optimization of the schedule itself.
In the next blog post, we will explore the architecture of the final prototype which aims at meeting the design goals, while being focused on not trying to meet the non-goals.