Boosting speed of scikit-learn regression algorithms

Datum

27.06.2023

content.autor.writtenBy

When browsing the web, numerous posts can be found discussing techniques to speed up the training time of well-known machine learning algorithms. Surprisingly, there is limited information available regarding the prediction phase. However, from a practical standpoint, this aspect holds great relevance. Once a satisfactory regression algorithm has been trained, it is typically deployed in a real-world system. In such cases, the speed at which predictions are obtained becomes crucial. Faster prediction times enable real-time or near-real-time decision-making, enhance user experience in interactive applications, and facilitate efficient processing of large-scale datasets. Therefore, optimizing inference speed can have significant implications for various domains and applications.

The purpose of this blog post is to investigate the performance and prediction speed behavior of popular regression algorithms, i.e. models that predict numerical values based on a set of input variables. Considering that scikit-learn is the most extensively utilized machine learning framework [1], our focus will be on exploring methods to accelerate its algorithms' predictions.

Benchmarking regression algorithms

To assess the current state of popular regression algorithms, we selected four popular regression datasets from Kaggle [2], along with an internal dataset from our company. These datasets vary in sample size, number and type of features, capturing performance for different data structures.

To ensure fair comparisons, we need to optimize hyperparameters before testing to unlock the models' full potential. We will benchmark the following regression algorithms:

The different versions of regularized linear regression, such as lasso, ridge, and elastic net, are not analyzed separately as they were comparable to pure linear regression in terms of prediction speed and accuracy in a pre-evaluation step.

Prediction speed vs. accuracy

The plot below displays the benchmarking results on our company's internal dataset. We can observe a sweet spot in the bottom left, where both the error - measured via root mean square error (RMSE) - and prediction times are low. Simple neural networks (Multilayer Perceptron, MLP) and gradient boosted regression trees demonstrate good performance in both dimensions. Random forest also shows decent accuracy but has the highest prediction time. Other algorithms exhibit reasonable prediction speed but relatively high errors.

However, it is crucial to try different algorithms on the specific dataset at hand. Accuracy and prediction time heavily depend on the number of features used, their transformations, as well as the model's parameters. Linear models, for example, may perform well with properly transformed features, while larger MLPs might exhibit longer prediction times. Nevertheless, algorithms like random forest and k-NN are by construction expected to be slower in inference speed.

How to speed up inference

Generally, scikit-learn models are already optimized through compiled Cython extensions or optimized computing libraries [3]. However, there are additional ways to accelerate prediction latency, apart from using faster hardware. In this blog post, we’ll benchmark the following optimizations:

Data-based approaches:

Reduce the number of features by selecting relevant ones or applying dimensionality reduction (“Half features”)

Implementation-based approaches:

Apply bulk prediction instead of atomic prediction, enabling parallelization and speeding up the process (“Bulk 100/1000”)
Utilize the Intel extension for scikit-learn, which supports certain algorithms and can lead to significant speed improvements (“Intel extension”)
Disable scikit-learn's validation overhead, which checks the finiteness of the data (“No finite check”)

Furthermore, we want to mention the following optimization approaches, which we did not include in our benchmark, partly because they are problem specific:

Data-based approaches:

Efficiently represent input data, such as using sparse matrix data structures
Optimize feature extraction and transformation, including efficient database queries and preprocessing tasks

Model-based approaches:

Reduce the complexity of the model, such as reducing the size of a random forest or MLP architecture
Utilize model-specific accelerators

Implementation-based approaches:

Implement the prediction step with given weights independently, potentially in a faster programming language, to avoid unnecessary overhead
Use cloud services for prediction, such as Google ML, Amazon ML or MS Azure

As you can see, there are numerous ways to influence inference time, ranging from fundamental approaches to simple tricks. Changing the data structure and implementing algorithms from scratch optimized for efficiency may be more involved, while the latter approaches can be easily applied even to existing systems that use scikit-learn.

Note that all of the above approaches do not affect prediction quality, except reducing the number of features and model complexity. For these approaches, it is important to evaluate the trade-off between prediction speed and quality.

In this blog post, we mostly benchmark approaches that do not affect prediction quality, and therefore focus on evaluating the speedup in the next section.

Evaluating some speedup tricks

Check out the technical appendix to see how the time measurement is performed.

Reducing the number of features by half (in our case from 106 to 53 features) only leads to small decreases in inference speed for KNN, SVR while it had an major influence on the MLP. Disabling scikit-learn's finiteness checkup, which is just one line of code, improves prediction speed more significantly. As can be seen below, inference time can be reduced up to 40% depending on the algorithm. Utilizing the Intel extension for scikit-learn, also requiring only one line of code, results in substantial speed improvements for random forest, SVR and the KNN regressor. For the latter two algorithms, a time reduction of more than 50% could be achieved, while for random forest, prediction time decreases by impressive 98%. In the plots below there are no values shown for the other algorithms as the Intel extension currently does not support those.

As can be seen below, most potential lies in bulk inference. By predicting several samples simultaneously (here: 100 or 1000 samples at once), the average prediction time per sample decreases significantly for most of the algorithms. Overall, bulk prediction can lead up to 200-fold speed increases in this test setting. This approach is particularly effective for the MLP as well as linear and tree based methods, greatly accelerating their performance.

Summary

Fast predictions are crucial for various use cases, in particular when it comes to real-time predictions. Moreover, investing in efficiency always pays off by reducing energy consumption, thus saving money and at the same time lowering carbon emissions.

In this blog post we have explored multiple ways to achieve faster prediction times. Firstly, the dimensionality of the data and the algorithm chosen have major influence on inference speed and scalability behaviour. However, there are various tricks to even accelerate existing scikit-learn code. Disabling scikit-learn's finite data validation or utilizing the Intel extension for supported algorithms can already yield considerable improvements depending on the algorithm. However, the most substantial gains can be achieved by addressing fundamental aspects, such as reducing the number of features (in particular for high-dimensional data), implementing bulk prediction or custom prediction methods. These strategies can result in speed increases by factors of several hundred.

In our small test setting, we could additionally show that a small neural network, gradient boosted regressor and random forest appear to be the most promising choices in terms of both accuracy and speed, when using the above-mentioned speedup tricks.

Sources

[1] https://storage.googleapis.com/kaggle-media/surveys/Kaggle%20State%20of%20Machine%20Learning%20and%20Data%20Science%202020.pdf

[2] House sales: House Sales in King County, USA ,

red wine quality: Red Wine Quality ,

avocado prices: Avocado Prices ,

medical insurance costs: Medical Cost Personal Datasets

[3] 8. Computing with scikit-learn — scikit-learn 0.23.2 documentation

Technical Appendix

Speedtests were performed with all unnecessary background processes stopped.

Inference time measurement for one test sample (“atomic prediction”):

n = 500 # number of consecutive runs
r = 10 # number of repeats of above

pred_times = timeit.repeat(stmt=lambda: model.predict(X_test[0]), 
  repeat=r, number=n)
pred_times = np.array(pred_times) / n # divide by number of runs
pred_time = np.min(pred_times) # take minimum of all repetitions

Inference time measurement for several samples at once (“bulk prediction”):

n = 50 # number of consecutive runs
r = 5 # number of repeats of above

X_test_sample = X_test[0:1000] # 100 or 1000
pred_times = timeit.repeat(stmt=lambda: model.predict(X_test_sample), 
  repeat=r, number=n)
pred_times = np.array(pred_times) / n # divide by number of runs
pred_times = pred_times / len(X_test_sample) # divide by number of samples
pred_time = np.min(pred_times) # take minimum of all repetitions

With “model” being the scikit-learn models mentioned above which were trained with the first 10.000 observations of the “house sales” data and using default model paramters.

Versions used:

Python: 3.9.7
Scikit-learn: 1.0.2
Scikit-learn-intelex: 2021.20210714.120553

Felix KrauseBlog

Blog

Part 2: Detecting Truck Parking Lots on Satellite Images

In the previous blog post, we created an already pretty powerful image segmentation model in order to detect the shape of truck parking lots on satellite images. However, we will now try to run the code on new hardware and get even better as well as more robust results.

Felix KrauseBlog

Blog

Creating a Cross-Domain Capable ML Pipeline

As classifying images into categories is a ubiquitous task occurring in various domains, a need for a machine learning pipeline which can accommodate for new categories is easy to justify. In particular, common general requirements are to filter out low-quality (blurred, low contrast etc.) images, and to speed up the learning of new categories if image quality is sufficient. In this blog post we compare several image classification models from the transfer learning perspective.

Rinat AbdullinBlog

Blog

Trustbit LLM Leaderboard

To address common questions concerning the integration of Large Language Models, we have created an LLM Product Leaderboard that focuses on building and shipping products.

Rinat AbdullinBlog

Blog

State of Fast Feedback in Data Science Projects

DSML projects can be quite different from the software projects: a lot of R&D in a rapidly evolving landscape, working with data, distributions and probabilities instead of code. However, there is one thing in common: iterative development process matters a lot.

Felix KrauseBlog

Blog

Part 1: Detecting Truck Parking Lots on Satellite Images

Real-time truck tracking is crucial in logistics: to enable accurate planning and provide reliable estimation of delivery times, operators build detailed profiles of loading stations, providing expected durations of truck loading and unloading, as well as resting times. Yet, how to derive an exact truck status based on mere GPS signals?

Rinat AbdullinBlog

Blog

Machine Learning Pipelines

In this first part, we explain the basics of machine learning pipelines and showcase what they could look like in simple form. Learn about the differences between software development and machine learning as well as which common problems you can tackle with them.

TIMETOACT

Service

Service

Operationalization of Data Science (MLOps)

Data and Artificial Intelligence (AI) can support almost any business process based on facts. Many companies are in the phase of professional assessment of the algorithms and technical testing of the respective technologies.

Felix KrauseBlog

Blog

License Plate Detection for Precise Car Distance Estimation

When it comes to advanced driver-assistance systems or self-driving cars, one needs to find a way of estimating the distance to other vehicles on the road.

Rinat AbdullinBlog

Blog

Strategic Impact of Large Language Models

This blog discusses the rapid advancements in large language models, particularly highlighting the impact of OpenAI's GPT models.

Rinat AbdullinBlog

Blog

Let's build an Enterprise AI Assistant

In the previous blog post we have talked about basic principles of building AI assistants. Let’s take them for a spin with a product case that we’ve worked on: using AI to support enterprise sales pipelines.

Rinat AbdullinBlog

Blog

LLM Performance Series: Batching

Beginning with the September Trustbit LLM Benchmarks, we are now giving particular focus to a range of enterprise workloads. These encompass the kinds of tasks associated with Large Language Models that are frequently encountered in the context of large-scale business digitalization.

Matus ZilinskyBlog

Blog

Creating a Social Media Posts Generator Website with ChatGPT

Using the GPT-3-turbo and DALL-E models in Node.js to create a social post generator for a fictional product can be really helpful. The author uses ChatGPT to create an API that utilizes the openai library for Node.js., a Vue component with an input for the title and message of the post. This article provides step-by-step instructions for setting up the project and includes links to the code repository.

Rinat AbdullinBlog

Blog

So You are Building an AI Assistant?

So you are building an AI assistant for the business? This is a popular topic in the companies these days. Everybody seems to be doing that. While running AI Research in the last months, I have discovered that many companies in the USA and Europe are building some sort of AI assistant these days, mostly around enterprise workflow automation and knowledge bases. There are common patterns in how such projects work most of the time. So let me tell you a story...

TIMETOACT

Technologie

Technologie

IBM SPSS Modeler

IBM SPSS Modeler is a tool that can be used to model and execute tasks, for example in the field of Data Science and Data Mining, via a graphical user interface.

Aqeel AlazreeBlog

Blog

Part 1: Data Analysis with ChatGPT

In this new blog series we will give you an overview of how to analyze and visualize data, create code manually and how to make ChatGPT work effectively. Part 1 deals with the following: In the data-driven era, businesses and organizations are constantly seeking ways to extract meaningful insights from their data. One powerful tool that can facilitate this process is ChatGPT, a state-of-the-art natural language processing model developed by OpenAI. In Part 1 pf this blog, we'll explore the proper usage of data analysis with ChatGPT and how it can help you make the most of your data.

TIMETOACT

Technologie

Decision Optimization

Mathematical algorithms enable fast and efficient improvement of partially contradictory specifications. As an integral part of the IBM Data Science platform "Cloud Pak for Data" or "IBM Watson Studio", decision optimisation has been decisively expanded and embedded in the Data Science process.

Nina DemuthBlog

Blog

7 Positive effects of visualizing the interests of your team

Interests maps unleash hidden potentials and interests, but they also make it clear which topics are not of interest to your colleagues.

Christian FolieBlog

Blog

The Power of Event Sourcing

This is how we used Event Sourcing to maintain flexibility, handle changes, and ensure efficient error resolution in application development.

Daniel PuchnerBlog

Blog

Make Your Value Stream Visible Through Structured Logging

Boost your value stream visibility with structured logging. Improve traceability and streamline processes in your software development lifecycle.

Daniel WellerBlog

Blog

Revolutionizing the Logistics Industry

As the logistics industry becomes increasingly complex, businesses need innovative solutions to manage the challenges of supply chain management, trucking, and delivery. With competitors investing in cutting-edge research and development, it is vital for companies to stay ahead of the curve and embrace the latest technologies to remain competitive. That is why we introduce the TIMETOACT Logistics Simulator Framework, a revolutionary tool for creating a digital twin of your logistics operation.

TIMETOACT GROUP

News

Proof-of-Value Workshop

Today's businesses need data integration solutions that offer open, reusable standards and a complete, innovative portfolio of data capabilities. Apply for one of our free workshops!

Aqeel AlazreeBlog

Blog

Part 4: Save Time and Analyze the Database File

ChatGPT-4 enables you to analyze database contents with just two simple steps (copy and paste), facilitating well-informed decision-making.

Christian FolieBlog

Blog

Running Hybrid Workshops

When modernizing or building systems, one major challenge is finding out what to build. In Pre-Covid times on-site workshops were a main source to get an idea about ‘the right thing’. But during Covid everybody got used to working remotely, so now the question can be raised: Is it still worth having on-site, physical workshops?

TIMETOACT

Service

Service

Demand Planning, Forecasting and Optimization

After the data has been prepared and visualized via dashboards and reports, the task is now to use the data obtained accordingly. Digital planning, forecasting and optimization describes all the capabilities of an IT-supported solution in the company to support users in digital analysis and planning.

Aqeel AlazreeBlog

Blog

Database Analysis Report

This report comprehensively analyzes the auto parts sales database. The primary focus is understanding sales trends, identifying high-performing products, Analyzing the most profitable products for the upcoming quarter, and evaluating inventory management efficiency.

Laura GaetanoBlog

Blog

Using a Skill/Will matrix for personal career development

Discover how a Skill/Will Matrix helps employees identify strengths and areas for growth, boosting personal and professional development.

Chrystal LantnikBlog

Blog

CSS :has() & Responsive Design

In my journey to tackle a responsive layout problem, I stumbled upon the remarkable benefits of the :has() pseudo-class. Initially, I attempted various other methods to resolve the issue, but ultimately, embracing the power of :has() proved to be the optimal solution. This blog explores my experience and highlights the advantages of utilizing the :has() pseudo-class in achieving flexible layouts.

Sebastian BelczykBlog

Blog

Composite UI with Design System and Micro Frontends

Discover how to create scalable composite UIs using design systems and micro-frontends. Enhance consistency and agility in your development process.

Nina DemuthBlog

Blog

Trustbit ML Lab Welcomes Grayskull e150 by Tenstorrent

Discover how Trustbit ML Lab integrates Tenstorrent's Grayskull e150, led by Jim Keller, for cutting-edge, energy-efficient AI processing.

Rinat AbdullinBlog

Blog

Innovation Incubator at TIMETOACT GROUP Austria

Discover how our Innovation Incubator empowers teams to innovate with collaborative, week-long experiments, driving company-wide creativity and progress.

Laura GaetanoBlog

Blog

5 lessons from running a (remote) design systems book club

Last year I gifted a design systems book I had been reading to a friend and she suggested starting a mini book club so that she’d have some accountability to finish reading the book. I took her up on the offer and so in late spring, our design systems book club was born. But how can you make the meetings fun and engaging even though you're physically separated? Here are a couple of things I learned from running my very first remote book club with my friend!

Rinat AbdullinBlog

Blog

Consistency and Aggregates in Event Sourcing

Learn how we ensures data consistency in event sourcing with effective use of aggregates, enhancing system reliability and performance.

Christian FolieBlog

Blog

Designing and Running a Workshop series: The board

In this part, we discuss the basic design of the Miro board, which will aid in conducting the workshops.

Sebastian BelczykBlog

Blog

Building and Publishing Design Systems | Part 2

Learn how to build and publish design systems effectively. Discover best practices for creating reusable components and enhancing UI consistency.

Ian RusselBlog

Blog

So, I wrote a book

Join me as I share the story of writing a book on F#. Discover the challenges, insights, and triumphs along the way.

TIMETOACT

Technologie

IBM Watson Studio

IBM Watson Studio is an integrated solution for implementing a data science landscape. It helps companies to structure and simplify the process from exploratory analysis to the implementation and operationalisation of the analysis processes.

Referenz

Quality scoring with predictive analytics models

Felss Systems GmbH relies on a specially developed predictive analytics method from X-INTEGRATE. With predictive scoring and automation, the efficiency of industrial machinery is significantly increased.

Daniel PuchnerBlog

Blog

How to gather data from Miro

Learn how to gather data from Miro boards with this step-by-step guide. Streamline your data collection for deeper insights.

Daniel PuchnerBlog

Blog

How we discover and organise domains in an existing product

Software companies and consultants like to flex their Domain Driven Design (DDD) muscles by throwing around terms like Domain, Subdomain and Bounded Context. But what lies behind these buzzwords, and how these apply to customers' diverse environments and needs, are often not as clear. As it turns out it takes a collaborative effort between stakeholders and development team(s) over a longer period of time on a regular basis to get them right.

TIMETOACT

Service

Service

Data Science, Artificial Intelligence and Machine Learning

For some time, Data Science has been considered the supreme discipline in the recognition of valuable information in large amounts of data. It promises to extract hidden, valuable information from data of any structure.

Rinat AbdullinBlog

Blog

Using NLP libraries for post-processing

Learn how to analyse sticky notes in miro from event stormings and how this analysis can be carried out with the help of the spaCy library.

Aqeel AlazreeBlog

Blog

Part 3: How to Analyze a Database File with GPT-3.5

In this blog, we'll explore the proper usage of data analysis with ChatGPT and how you can analyze and visualize data from a SQLite database to help you make the most of your data.

TIMETOACT

Technologie

IBM Netezza Performance Server

IBM offers Database technology for specific purposes in the form of appliance solutions. In the Data Warehouse environment, the Netezza technology, later marketed under the name "IBM PureData for Analytics", is particularly well known.

Christian FolieBlog

Blog

Designing and Running a Workshop series: An outline

Learn how to design and execute impactful workshops. Discover tips, strategies, and a step-by-step outline for a successful workshop series.

TIMETOACT

Service

Service

Conception of individual Analytics and Big Data solutions

We determine the best approach to develop an individual solution from the professional, role-specific requirements – suitable for the respective situation!

TIMETOACT

Technologie

Talend Real-Time Big Data Platform

Talend Big Data Platform simplifies complex integrations so you can successfully use Big Data with Apache Spark, Databricks, AWS, IBM Watson, Microsoft Azure, Snowflake, Google Cloud Platform and NoSQL.

TIMETOACT

Service

Business Intelligence

Business Intelligence (BI) is a technology-driven process for analyzing data and presenting usable information. On this basis, sound decisions can be made.

Kompetenz

Kompetenz

Data Science, AI & Advanced Analytics

Data Science & Advanced Analytics includes a wide range of tools that can examine business processes, help drive change and improvement.

Laura GaetanoBlog

Blog

My Weekly Shutdown Routine

Discover my weekly shutdown routine to enhance productivity and start each week fresh. Learn effective techniques for reflection and organization.

TIMETOACT

Service

Service

Dashboards & Reports

The discipline of Business Intelligence provides the necessary means for accessing data. In addition, various methods have developed that help to transport information to the end user through various technologies.

Rinat AbdullinBlog

Blog

Announcing Domain-Driven Design Exercises

Interested in Domain Dirven Design? Then this DDD exercise is perfect for you!

TIMETOACT

Technologie

Talend Data Integration

Talend Data Integration offers a highly scalable architecture for almost any application and any data source - with well over 900 connectors from cloud solutions like Salesforce to classic on-premises systems.

TIMETOACT

Referenz

Standardized data management creates basis for reporting

TIMETOACT implements a higher-level data model in a data warehouse for TRUMPF Photonic Components and provides the necessary data integration connection with Talend. With this standardized data management, TRUMPF will receive reports based on reliable data in the future and can also transfer the model to other departments.

TIMETOACT

Technologie

Technologie

Talend Application Integration / ESB

With Talend Application Integration, you create a service-oriented architecture and connect, broker & manage your services and APIs in real time.

TIMETOACT GROUP

Branche

Branche

Internal and external security

Defense forces and police must protect citizens and the state from ever new threats. Modern IT & software solutions support them in this task.

TIMETOACT GROUP

Branche

Branche

On-site digitization partner for insurance companies

As TIMETOACT GROUP, we are one of the leading digitization partners for IT solutions in Germany, Austria and Switzerland. As your partner, we are there for you at 17 locations and will find the right solution on the path to digitization - gladly together in a personal exchange on site.

TIMETOACT

Technologie

IBM Cloud Pak for Automation

The IBM Cloud Pak for Automation helps you automate manual steps on a uniform platform with standardised interfaces. With the Cloud Pak for Business Automation, the entire life cycle of a document or process can be mapped in the company.

TIMETOACT

Technologie

Technologie

IBM InfoSphere Information Server

IBM Information Server is a central platform for enterprise-wide information integration. With IBM Information Server, business information can be extracted, consolidated and merged from a wide variety of sources.

TIMETOACT GROUP

Branche

Insurance

Insurance companies live by making a promise to people - and that promise is security. Crucial to success is not only the mastery of new technologies and new forms of collaboration, but above all a change in corporate culture.

TIMETOACT

Technologie

IBM Cloud Pak for Data

The Cloud Pak for Data acts as a central, modular platform for analytical use cases. It integrates functions for the physical and virtual integration of data into a central data pool - a data lake or a data warehouse, a comprehensive data catalogue and numerous possibilities for (AI) analysis up to the operational use of the same.

TIMETOACT

Technologie

IBM Db2

The IBM Db2database has been established on the market for many years as the leading data warehouse database in addition to its classic use in operations.

TIMETOACT

Technologie

Talend Data Fabric

The ultimate solution for your data needs – Talend Data Fabric includes everything your (Data Integration) heart desires and serves all integration needs relating to applications, systems and data.

Rinat AbdullinBlog

Blog

Learning + Sharing at TIMETOACT GROUP Austria

Discover how we fosters continuous learning and sharing among employees, encouraging growth and collaboration through dedicated time for skill development.

Christoph HasenzaglBlog

Blog

Common Mistakes in the Development of AI Assistants

How fortunate that people make mistakes: because we can learn from them and improve. We have closely observed how companies around the world have implemented AI assistants in recent months and have, unfortunately, often seen them fail. We would like to share with you how these failures occurred and what can be learned from them for future projects: So that AI assistants can be implemented more successfully in the future!

TIMETOACT

Technologie

Technologie

IBM Watson® Knowledge Catalog/Information Governance Catalog

Today, "IGC" is a proprietary enterprise cataloging and metadata management solution that is the foundation of all an organization's efforts to comply with rules and regulations or document analytical assets.

TIMETOACT GROUP

Service

AI & Data Science

The amount of data that companies produce and process every day is constantly growing. This data contains valuable information about customers, markets, business processes and much more. But how can companies use this data effectively to make better decisions, improve their products and services and tap into new business opportunities?

TIMETOACT

Technologie

IBM Cloud Pak for Application

The IBM Cloud Pak for Application provides a solid foundation for developing, deploying and modernising cloud-native applications. Since agile working is essential for a faster release cycle, ready-made DevOps processes are used, among other things.

Jonathan ChannonBlog

Blog

Tracing IO in .NET Core

Learn how we leverage OpenTelemetry for efficient tracing of IO operations in .NET Core applications, enhancing performance and monitoring.

Ian RusselBlog

Blog

Introduction to Functional Programming in F# – Part 7

Explore LINQ and query expressions in F#. Simplify data manipulation and enhance your functional programming skills with this guide.

Rinat AbdullinBlog

Blog

Event Sourcing with Apache Kafka

For a long time, there was a consensus that Kafka and Event Sourcing are not compatible with each other. So it might look like there is no way of working with Event Sourcing. But there is if certain requirements are met.

Aqeel AlazreeBlog

Blog

Part 2: Data Analysis with powerful Python

Analyzing and visualizing data from a SQLite database in Python can be a powerful way to gain insights and present your findings. In Part 2 of this blog series, we will walk you through the steps to retrieve data from a SQLite database file named gold.db and display it in the form of a chart using Python. We'll use some essential tools and libraries for this task.

Jörg EgretzbergerBlog

Blog

8 tips for developing AI assistants

AI assistants for businesses are hype, and many teams were already eagerly and enthusiastically working on their implementation. Unfortunately, however, we have seen that many teams we have observed in Europe and the US have failed at the task. Read about our 8 most valuable tips, so that you will succeed.

Sebastian BelczykBlog

Blog

Building a micro frontend consuming a design system | Part 3

In this blopgpost, you will learn how to create a react application that consumes a design system.

Bernhard SchauerBlog

Blog

ADRs as a Tool to Build Empowered Teams

Learn how we use Architecture Decision Records (ADRs) to build empowered, autonomous teams, enhancing decision-making and collaboration.

Rinat AbdullinBlog

Blog

5 Inconvenient Questions when hiring an AI company

This article discusses five questions you should ask when buying an AI. These questions are inconvenient for providers of AI products, but they are necessary to ensure that you are getting the best product for your needs. The article also discusses the importance of testing the AI system on your own data to see how it performs.

Ian RusselBlog

Blog

Introduction to Functional Programming in F# – Part 12

Explore reflection and meta-programming in F#. Learn how to dynamically manipulate code and enhance flexibility with advanced techniques.

Rinat AbdullinBlog

Blog

Innovation Incubator Round 1

Team experiments with new technologies and collaborative problem-solving: This was our first round of the Innovation Incubator.

Rinat AbdullinBlog

Blog

The Intersection of AI and Voice Manipulation

The advent of Artificial Intelligence (AI) in text-to-speech (TTS) technologies has revolutionized the way we interact with written content. Natural Readers, standing at the forefront of this innovation, offers a comprehensive suite of features designed to cater to a broad spectrum of needs, from personal leisure to educational support and commercial use. As we delve into the capabilities of Natural Readers, it's crucial to explore both the advantages it brings to the table and the ethical considerations surrounding voice manipulation in TTS technologies.

Ian RusselBlog

Blog

Introduction to Functional Programming in F# – Part 3

Dive into F# data structures and pattern matching. Simplify code and enhance functionality with these powerful features.

TIMETOACT GROUP

Service

Data Insights

With Data Insights, we help you step by step with the appropriate architecture to use new technologies and develop a data-driven corporate culture: from the development of new data sources, to exploratory analysis to gain new insights, to predictive models.

TIMETOACT

Technologie

IBM Cloud Pak for Data Accelerator

For a quick start in certain use cases, specifically for certain business areas or industries, IBM offers so-called accelerators based on the "Cloud Pak for Data" solution, which serve as a template for project development and can thus significantly accelerate the implementation of these use cases. The platform itself provides all the necessary functions for all types of analytics projects, and the accelerators provide the respective content.

Produkt

Cloud Machine Learning

Instead of writing code that describes the action to be performed by the computer, your code provides an algorithm that adapts itself. Learn faster and better with Machine Learning!

TIMETOACT

Blog

TIMETOACT starts with the implementation of ESG-Suit

Compliance with ESG and sustainability standards is mandatory for companies in order to meet the requirements of the EU's Corporate Sustainability Reporting Directive (CSRD).

TIMETOACT

Service

Service

Big Data, Data Lake & Data Warehousing

For the optimal solution – with special consideration of the business requirements – we combine different functionalities.

TIMETOACT

Service

Data Governance

Data Governance describes all processes that aim to ensure the traceability, quality and protection of data. The need for documentation and traceability increases exponentially as more and more data from different sources is used for decision-making and as a result of the technical possibilities of integration in Data Warehouses or Data Lakes.

TIMETOACT

Technologie

Technologie

IBM Cloud Pak for Data – Test-Drive

By making our comprehensive demo and customer data platform available, we want to offer these customers a way to get a very quick and pragmatic impression of the technology with their data.

Ian RusselBlog

Blog

Introduction to Functional Programming in F# – Part 8

Discover Units of Measure and Type Providers in F#. Enhance data management and type safety in your applications with these powerful tools.

Ian RusselBlog

Blog

Introduction to Functional Programming in F# – Part 9

Explore Active Patterns and Computation Expressions in F#. Enhance code clarity and functionality with these advanced techniques.

Sebastian BelczykBlog

Blog

Building A Shell Application for Micro Frontends | Part 4

We already have a design system, several micro frontends consuming this design system, and now we need a shell application that imports micro frontends and displays them.

Ian RusselBlog

Blog

Introduction to Functional Programming in F# – Part 5

Master F# asynchronous workflows and parallelism. Enhance application performance with advanced functional programming techniques.

Ian RusselBlog

Blog

Introduction to Functional Programming in F# – Part 6

Learn error handling in F# with option types. Improve code reliability using F#'s powerful error-handling techniques.

Ian RusselBlog

Blog

Introduction to Functional Programming in F# – Part 4

Unlock F# collections and pipelines. Manage data efficiently and streamline your functional programming workflow with these powerful tools.

Blog

My Workflows During the Quarantine

The current situation has deeply affected our daily lives. However, in retrospect, it had a surprisingly small impact on how we get work done at TIMETOACT GROUP Austria.

Boosting speed of scikit-learn regression algorithms

Benchmarking regression algorithms

Prediction speed vs. accuracy

How to speed up inference

Evaluating some speedup tricks

Summary

Sources

Technical Appendix

More on this topic

Part 2: Detecting Truck Parking Lots on Satellite Images

Creating a Cross-Domain Capable ML Pipeline

Trustbit LLM Leaderboard

State of Fast Feedback in Data Science Projects

Part 1: Detecting Truck Parking Lots on Satellite Images

Machine Learning Pipelines

Operationalization of Data Science (MLOps)

License Plate Detection for Precise Car Distance Estimation

Strategic Impact of Large Language Models

Let's build an Enterprise AI Assistant

LLM Performance Series: Batching

Creating a Social Media Posts Generator Website with ChatGPT

So You are Building an AI Assistant?

IBM SPSS Modeler

Part 1: Data Analysis with ChatGPT

Decision Optimization

7 Positive effects of visualizing the interests of your team

The Power of Event Sourcing

Make Your Value Stream Visible Through Structured Logging

Revolutionizing the Logistics Industry

Proof-of-Value Workshop

Part 4: Save Time and Analyze the Database File

Running Hybrid Workshops

Demand Planning, Forecasting and Optimization

Database Analysis Report

Using a Skill/Will matrix for personal career development

CSS :has() & Responsive Design

Composite UI with Design System and Micro Frontends

Trustbit ML Lab Welcomes Grayskull e150 by Tenstorrent

Innovation Incubator at TIMETOACT GROUP Austria

5 lessons from running a (remote) design systems book club

Consistency and Aggregates in Event Sourcing

Designing and Running a Workshop series: The board

Building and Publishing Design Systems | Part 2

So, I wrote a book

IBM Watson Studio

Quality scoring with predictive analytics models

How to gather data from Miro

How we discover and organise domains in an existing product

Data Science, Artificial Intelligence and Machine Learning

Using NLP libraries for post-processing

Part 3: How to Analyze a Database File with GPT-3.5

IBM Netezza Performance Server

Designing and Running a Workshop series: An outline

Conception of individual Analytics and Big Data solutions

Talend Real-Time Big Data Platform

Business Intelligence

Data Science, AI & Advanced Analytics

My Weekly Shutdown Routine

Dashboards & Reports

Announcing Domain-Driven Design Exercises

Talend Data Integration

Standardized data management creates basis for reporting

Talend Application Integration / ESB

Internal and external security

On-site digitization partner for insurance companies

IBM Cloud Pak for Automation

IBM InfoSphere Information Server

Insurance

IBM Cloud Pak for Data

IBM Db2

Talend Data Fabric

Learning + Sharing at TIMETOACT GROUP Austria

Common Mistakes in the Development of AI Assistants

IBM Watson® Knowledge Catalog/Information Governance Catalog

AI & Data Science

IBM Cloud Pak for Application

Tracing IO in .NET Core

Introduction to Functional Programming in F# – Part 7

Event Sourcing with Apache Kafka

Part 2: Data Analysis with powerful Python