Data science and Machine learning (ML), in particular, has witnessed increased interest over the last decade both as an academic field of research and as an essential tool for business solutions. However, machine learning models are often deployed in production systems by machine learning practitioners who frequently encounter challenges during this process.
The machine learning development process is inherently complex due to the many considerations that must be made throughout training, validating, and deploying ml models. Automation of specific machine learning tasks is complex due to the machine-specific nature of data preparation, feature engineering, model selection, training configuration, performance evaluation metrics, and business objectives.
The Four Stages of the Machine Learning Process
In machine learning, there are four main stages. These stages are data management, model learning, model verification, and model deployment.
The first stage is “data management”. Data is collected and analyzed to find the best way to predict what will happen next to make decisions or answer questions about a problem.
The second stage is called “model learning”. This stage looks at how well a given prediction algorithm does and makes adjustments accordingly so it can be more accurate in future predictions.
Next comes “model verification”, which looks at how well this new machine learning algorithm works with different datasets or variables to see if it does better than past algorithms have done on those particular datasets or variables before making any changes there too.
The final stage is “model deployment” which is the process of integrating the machine learning model into the software infrastructure needed to run it.
What Does Data Management Mean?
In data management, the main task is to prepare the needed data to build a machine learning model. The data may be in a raw form, or it may need to be processed and cleaned up. The machine learning model depends on the data and its properties such as size, distribution, format, etc.
Data acquisition can be done in an uncontrolled or controlled way. Uncontrolled data comes from real traffic but because of its nature, implementing a reliable ground truth is complex. Controlled data, on the other hand, emulates traffic, and label assignment can introduce errors.
Labeling data when the amount of it is important such as in network analysis can be very challenging. Another issue is the difficulty of having access to experts for labeling purposes. This process is also very costly and time-consuming. An option is to use labeling tools such as semi-supervised machine learning models but the process results in a loss of accuracy.
In some fields, such as healthcare, a loss in accuracy can be unacceptable because a slight deviation can have catastrophic consequences. Even combining data from different sources can be a challenging task.
Sometimes the machine learning process starts with a specific machine learning task in mind, and machine-learning practitioners try to find the best machine learning algorithm for this task. However, choosing an appropriate machine learning algorithm can be challenging since many choices are available.
What is Model Learning?
Model learning is machine learning algorithms development where data scientists or machine learning engineers choose, configure, and train machine learning models to accomplish a particular machine learning task.
One of the most popular machine learning solutions is the use of deep neural networks because they can be used for different machine learning tasks such as classification, regression, or clustering. Deep neural nets have become very popular in the machine-learning community due to their ability to automatically learn feature representations from data. However, training deep neural networks is a complex task, and it can be challenging to obtain good performance on real-world data.
Despite this, deep learning models show good results in the context of drones. They use image sensors due to their cost-effectiveness, low weight, and low power consumption.
Because of this, sensor fusion is widely used, but the limitation of computational resources to deploy processing instruments on-board remains a challenge.
A good practice is to implement a simple machine learning model whenever possible. They come with multiple advantages. They are often faster to deploy, less greedy in terms of resources. Because conducting many machine learning experiments can be time-consuming, it is also important to compute resources used in terms of time and money for marginal gains in terms of model performance.
When there is a need to deploy a solution in a resource-constrained environment, choosing a more straightforward solution is key in the decision process.
Another advantage is their interpretability. Interpretability of ml models is often an essential requirement for stakeholders, making decision trees algorithms of choice.
One of the main obstacles to deploying a machine learning solution is its cost during the model training process. For example, it has been found that to train a BERT model, depending on the size, the training procedure can cost anywhere between $50k and $1.6m, as these models already use billions of parameters.
What is Meant by Machine Learning Model Validation and Verification?
Model verification is the most crucial stage of the machine learning process since it determines whether the machine learning model meets the functional and performance requirements. It is essential to ensure that the model generalizes well on unseen data and is robust. Also, the model does not only need to follow the technical and business requirements, but it also needs to comply with regulations, for example, in the banking industry.
Ideally, models are tested in a real-life scenario where business-driven metrics can be observed. However, this process is often substituted with a simulation environment for safety or scale reasons. In addition, the dataset needs to be monitored to ensure the stability of the model performances.
What Does Deploying Machine Learning Models Into Production Represent?
Machine Learning deployment needs to follow DevOps principles. In this perspective, MLOps is gaining traction. It is an emerging field to face the challenges specific to machine learning solutions deployed in production. While production MLOps has some similarities with traditional software engineering, the dynamic nature of ml systems implies the necessity of having specific tools to deploy machine learning systems.
Machine learning engineers and software engineers often find themselves working together on the same project, and the tasks seem to be clearly separated on the surface, where machine learning engineers produce the model while software engineers build infrastructure to run it.
However, in reality, their work often overlaps with one another on questions related to the development process, model inputs, and outputs, as well as the performance metrics, and bring their own set of challenges. These challenges can be faced in building the infrastructures necessary to run, implement, and deploy models in production.
Additionally, ml systems come with many configuration settings that need to be set and maintained, adding to the whole configuration debt of the system.
Last but not least, once the model is deployed, it also needs to be updated through retraining or continuous learning. In this context, data drift and concept drift need to be monitored to ensure the quality of the model because a sudden change in the model environment can have disastrous consequences on its performance.
What Are The Risks of Machine Learning?
Another essential point to consider is the ethical implications and the users’ trust in machine learning. Many countries have implemented regulations to protect personal data. In healthcare, for example, access to user data can be an obstacle. In this situation, data augmentation through model training can be a solution.
Also, machine learning systems now face a new kind of threat in the form of adversarial attacks. These attacks can occur on data, for example, by adding noise to the data, compromising its integrity, which leads to model corruption. This issue is particularly problematic when the model continuously learns from new data.
Another type of data known as model stealing happens through reverse engineering techniques by querying the model inputs and monitoring its outputs, resulting in a loss of intellectual property.
These challenges are significant, but machine learning is now indispensable to the decision-making process. The best way to face these issues is by making machine learning systems part of your DevOps processes, monitoring machine learning models, and always being aware of their limitations in real-world situations. Deploying models in production environments following the principles of production MLOps will be critical in the success of ml systems deployed in production. While the development of ml systems is still emerging as a new data science subfield, with more and more open-source tools and guidelines available to data scientists and machine learning engineers, we can be hopeful that the current challenges faced during the process of bringing machine learning systems in production will decrease over time.
This article is based on the following paper: Challenges in Deploying Machine Learning: a Survey of Case Studies - 18 Jan 2021. If you want to stay in touch with my content, feel free to join my newsletter here.