Boosting speed of scikit-learn regression algorithms

When browsing the web, numerous posts can be found discussing techniques to speed up the training time of well-known machine learning algorithms. Surprisingly, there is limited information available regarding the prediction phase. However, from a practical standpoint, this aspect holds great relevance. Once a satisfactory regression algorithm has been trained, it is typically deployed in a real-world system. In such cases, the speed at which predictions are obtained becomes crucial. Faster prediction times enable real-time or near-real-time decision-making, enhance user experience in interactive applications, and facilitate efficient processing of large-scale datasets. Therefore, optimizing inference speed can have significant implications for various domains and applications.

The purpose of this blog post is to investigate the performance and prediction speed behavior of popular regression algorithms, i.e. models that predict numerical values based on a set of input variables. Considering that scikit-learn is the most extensively utilized machine learning framework [1], our focus will be on exploring methods to accelerate its algorithms' predictions.

Benchmarking regression algorithms

To assess the current state of popular regression algorithms, we selected four popular regression datasets from Kaggle [2], along with an internal dataset from our company. These datasets vary in sample size, number and type of features, capturing performance for different data structures.

To ensure fair comparisons, we need to optimize hyperparameters before testing to unlock the models' full potential. We will benchmark the following regression algorithms:

The different versions of regularized linear regression, such as lasso, ridge, and elastic net, are not analyzed separately as they were comparable to pure linear regression in terms of prediction speed and accuracy in a pre-evaluation step.

Prediction speed vs. accuracy

The plot below displays the benchmarking results on our company's internal dataset. We can observe a sweet spot in the bottom left, where both the error - measured via root mean square error (RMSE) - and prediction times are low. Simple neural networks (Multilayer Perceptron, MLP) and gradient boosted regression trees demonstrate good performance in both dimensions. Random forest also shows decent accuracy but has the highest prediction time. Other algorithms exhibit reasonable prediction speed but relatively high errors.

However, it is crucial to try different algorithms on the specific dataset at hand. Accuracy and prediction time heavily depend on the number of features used, their transformations, as well as the model's parameters. Linear models, for example, may perform well with properly transformed features, while larger MLPs might exhibit longer prediction times. Nevertheless, algorithms like random forest and k-NN are by construction expected to be slower in inference speed.

How to speed up inference

Generally, scikit-learn models are already optimized through compiled Cython extensions or optimized computing libraries [3]. However, there are additional ways to accelerate prediction latency, apart from using faster hardware. In this blog post, we’ll benchmark the following optimizations:

Data-based approaches:

Implementation-based approaches:

Furthermore, we want to mention the following optimization approaches, which we did not include in our benchmark, partly because they are problem specific:

Data-based approaches:

Model-based approaches:

Implementation-based approaches:

  • Implement the prediction step with given weights independently, potentially in a faster programming language, to avoid unnecessary overhead

  • Use cloud services for prediction, such as Google ML, Amazon ML or MS Azure

As you can see, there are numerous ways to influence inference time, ranging from fundamental approaches to simple tricks. Changing the data structure and implementing algorithms from scratch optimized for efficiency may be more involved, while the latter approaches can be easily applied even to existing systems that use scikit-learn.

Note that all of the above approaches do not affect prediction quality, except reducing the number of features and model complexity. For these approaches, it is important to evaluate the trade-off between prediction speed and quality.

In this blog post, we mostly benchmark approaches that do not affect prediction quality, and therefore focus on evaluating the speedup in the next section.

Evaluating some speedup tricks

Check out the technical appendix to see how the time measurement is performed.

Reducing the number of features by half (in our case from 106 to 53 features) only leads to small decreases in inference speed for KNN, SVR while it had an major influence on the MLP. Disabling scikit-learn's finiteness checkup, which is just one line of code, improves prediction speed more significantly. As can be seen below, inference time can be reduced up to 40% depending on the algorithm. Utilizing the Intel extension for scikit-learn, also requiring only one line of code, results in substantial speed improvements for random forest, SVR and the KNN regressor. For the latter two algorithms, a time reduction of more than 50% could be achieved, while for random forest, prediction time decreases by impressive 98%. In the plots below there are no values shown for the other algorithms as the Intel extension currently does not support those.

As can be seen below, most potential lies in bulk inference. By predicting several samples simultaneously (here: 100 or 1000 samples at once), the average prediction time per sample decreases significantly for most of the algorithms. Overall, bulk prediction can lead up to 200-fold speed increases in this test setting. This approach is particularly effective for the MLP as well as linear and tree based methods, greatly accelerating their performance.

Summary

Fast predictions are crucial for various use cases, in particular when it comes to real-time predictions. Moreover, investing in efficiency always pays off by reducing energy consumption, thus saving money and at the same time lowering carbon emissions.

In this blog post we have explored multiple ways to achieve faster prediction times. Firstly, the dimensionality of the data and the algorithm chosen have major influence on inference speed and scalability behaviour. However, there are various tricks to even accelerate existing scikit-learn code. Disabling scikit-learn's finite data validation or utilizing the Intel extension for supported algorithms can already yield considerable improvements depending on the algorithm. However, the most substantial gains can be achieved by addressing fundamental aspects, such as reducing the number of features (in particular for high-dimensional data), implementing bulk prediction or custom prediction methods. These strategies can result in speed increases by factors of several hundred.

In our small test setting, we could additionally show that a small neural network, gradient boosted regressor and random forest appear to be the most promising choices in terms of both accuracy and speed, when using the above-mentioned speedup tricks.


Sources

[1] https://storage.googleapis.com/kaggle-media/surveys/Kaggle%20State%20of%20Machine%20Learning%20and%20Data%20Science%202020.pdf 

[2] House sales: House Sales in King County, USA

red wine quality: Red Wine Quality ,

avocado prices: Avocado Prices ,

medical insurance costs: Medical Cost Personal Datasets

[3] 8. Computing with scikit-learn — scikit-learn 0.23.2 documentation

 


Technical Appendix

Speedtests were performed with all unnecessary background processes stopped.

 

Inference time measurement for one test sample (“atomic prediction”):

n = 500 # number of consecutive runs
r = 10 # number of repeats of above

pred_times = timeit.repeat(stmt=lambda: model.predict(X_test[0]), 
  repeat=r, number=n)
pred_times = np.array(pred_times) / n # divide by number of runs
pred_time = np.min(pred_times) # take minimum of all repetitions

Inference time measurement for several samples at once (“bulk prediction”):

n = 50 # number of consecutive runs
r = 5 # number of repeats of above

X_test_sample = X_test[0:1000] # 100 or 1000
pred_times = timeit.repeat(stmt=lambda: model.predict(X_test_sample), 
  repeat=r, number=n)
pred_times = np.array(pred_times) / n # divide by number of runs
pred_times = pred_times / len(X_test_sample) # divide by number of samples
pred_time = np.min(pred_times) # take minimum of all repetitions

With “model” being the scikit-learn models mentioned above which were trained with the first 10.000 observations of the “house sales” data and using default model paramters.

 

Versions used:

  • Python: 3.9.7

  • Scikit-learn: 1.0.2

  • Scikit-learn-intelex: 2021.20210714.120553

Blog 5/1/21

Ways of Creating Single Case Discriminated Unions in F#

There are quite a few ways of creating single case discriminated unions in F# and this makes them popular for wrapping primitives. In this post, I will go through a number of the approaches that I have seen.

Headerbild zu Operationalisierung von Data Science (MLOps)
Service

Operationalization of Data Science (MLOps)

Data and Artificial Intelligence (AI) can support almost any business process based on facts. Many companies are in the phase of professional assessment of the algorithms and technical testing of the respective technologies.

Service

Digitalization and cloud transformation

We adjust the levers of your individual digitalization initiative for speed, adaptability, security and compliance!

Blog 4/28/23

Creating a Social Media Posts Generator Website with ChatGPT

Using the GPT-3-turbo and DALL-E models in Node.js to create a social post generator for a fictional product can be really helpful. The author uses ChatGPT to create an API that utilizes the openai library for Node.js., a Vue component with an input for the title and message of the post. This article provides step-by-step instructions for setting up the project and includes links to the code repository.

Referenz

Why performance is decisive

In order to make the performance of an Atlassian toolchain measurable, individual examinations must be carried out. catworkx relies on the controlling of performance values...

Headerbild zu IBM Netezza Performance Server
Technologie

IBM Netezza Performance Server

IBM offers Database technology for specific purposes in the form of appliance solutions. In the Data Warehouse environment, the Netezza technology, later marketed under the name "IBM PureData for Analytics", is particularly well known.

Wissen 4/14/23

The "Beautiful five" and the Power of "One-Number" Reporting

Key figures are a perennial favorite in idea management, have been used for many years (decades) and are now very topical again. The reasons are obvious. You want to set performance benchmarks, define targets, follow up on where things are not going so well and measure the success or failure of idea management.

Blog

Switzerland in focus: An interview with CEO Michi

What does leadership look like at catworkx? Michael Bernhard, managing director of catworkx AG Switzerland, provides insights into leadership, values and the everyday life of a leader.

Blog 9/14/22

Learn & Share video Obsidian

Knowledge is very powerful. So, finding the right tool to help you gather, structure and access information anywhere and anytime, is rather a necessity than an option. You want to accomplish your tasks better? You want a reliable tool which is easy to use, extendable and adaptable to your personal needs? Today I would like to introduce you to the knowledge management system of my choice: Obsidian.

Produkt

Cloud Machine Learning

Instead of writing code that describes the action to be performed by the computer, your code provides an algorithm that adapts itself. Learn faster and better with Machine Learning!

Referenz

Integrated Project and User Portal (IPUP)

Transparent and flexible management of projects and users in large environments with Jira Service Management: catworkx has developed a tool for a major customer from the automotive industry, with which projects and the assignment of users involved can be set up largely automatically.

Übersicht 9/12/22

BigPicture-Trainings

BigPicture for Jira offers extensive functions for project or product management. Whether agile methods, the waterfall principle or a mixture of both - BigPicture offers many additional functions that Jira does not provide out of the box. In order to be able to use the popular tool in a targeted manner, catworkx offers the official training courses of the manufacturer Appfire.

Headerbild zu IBM Decision Optimization
Technologie

Decision Optimization

Mathematical algorithms enable fast and efficient improvement of partially contradictory specifications. As an integral part of the IBM Data Science platform "Cloud Pak for Data" or "IBM Watson Studio", decision optimisation has been decisively expanded and embedded in the Data Science process.

Blog 12/19/22

Creating a Cross-Domain Capable ML Pipeline

As classifying images into categories is a ubiquitous task occurring in various domains, a need for a machine learning pipeline which can accommodate for new categories is easy to justify. In particular, common general requirements are to filter out low-quality (blurred, low contrast etc.) images, and to speed up the learning of new categories if image quality is sufficient. In this blog post we compare several image classification models from the transfer learning perspective.

Event

CLOUDPILOTS & Google present: Breakfast & Learn Innsbruck

Together with Google, we present how modern cloud and Gen AI solutions can give your company a decisive competitive advantage!

Blog 3/14/25

What is SAP S/4HANA Cloud Public Edition?

This blog post covers everything you need to know about SAP S/4HANA Cloud Public Edition, including its key features, implementation options, and the advantages of adopting this powerful ERP solution.

Blog 11/30/22

Part 2: Detecting Truck Parking Lots on Satellite Images

In the previous blog post, we created an already pretty powerful image segmentation model in order to detect the shape of truck parking lots on satellite images. However, we will now try to run the code on new hardware and get even better as well as more robust results.

Blog 7/22/24

So You are Building an AI Assistant?

So you are building an AI assistant for the business? This is a popular topic in the companies these days. Everybody seems to be doing that. While running AI Research in the last months, I have discovered that many companies in the USA and Europe are building some sort of AI assistant these days, mostly around enterprise workflow automation and knowledge bases. There are common patterns in how such projects work most of the time. So let me tell you a story...

Blog 11/30/22

Introduction to Partial Function Application in F#

Partial Function Application is one of the core functional programming concepts that everyone should understand as it is widely used in most F# codebases.In this post I will introduce you to the grace and power of partial application. We will start with tupled arguments that most devs will recognise and then move onto curried arguments that allow us to use partial application.

Standort

Location of Winterthur

Find catworkx AG and IPG Information Process Group AG in Winterthur: Theaterstrasse 17, 8400 Winterthur

Bleiben Sie mit dem TIMETOACT GROUP Newsletter auf dem Laufenden!