Creating a Cross-Domain Capable ML Pipeline

Introduction

As classifying images into categories is a ubiquitous task occurring in various domains, a need for a machine learning pipeline which can accommodate for new categories is easy to justify. In particular, common general requirements are to filter out low-quality (blurred, low contrast etc.) images, and to speed up the learning of new categories if image quality is sufficient. The task of repurposing a trained model for a different (yet related) task is known as transfer learning. In this blog post, resulting from a joint work with Aigiz Kunafin (Data Scientist at TIMETOACT GROUP Austria), we compare several image classification models from the transfer learning perspective.

Assessment dataset

Our starting point is the task of vehicle classification, based on the well-known kaggle dataset from Tampere University. The images vary greatly in shape and size as well as in content. We first only focus on the two classes, namely cars and trucks. The car class contains images of all different angles and almost all motor car types, from antique cars to racing cars. The truck class comprises nearly all truck types one can imagine. Beside “ordinary” trucks as usually seen on the streets, there are images of military, fire, tow, garbage and tank trucks, to name a few. In addition to cars and trucks, we artificially generate a third class to simulate bad image quality. Therefore, we draw a sample of equal size from each class and randomly augment the images (flipping, blurring, grain, motion blur, darkness). Finally, the dataset contains 971 images of each class.

Finding an appropriate classification algorithm

Our approach is now to implement the architectures below in order to find a classification algorithm with high accuracy for our pipeline. After shaping the data to be used by the models, we save 20% of the data as a test set of unseen images for the comparison of shortlisted models. We then use the remaining 80% of the data for training and validation of the models during their training processes, with a 85/15 train-test-split.

Baseline

As the baseline, we choose a standard PyTorch-based learning pipeline with the ResNet50 model, pre-trained on the ImageNet-1k dataset (also referred to as ILSVRC2012) at resolution 224x224. This model comes in handy for trivial image classification tasks, as it is already familiar with everyday objects, including cars and trucks.

Stepwise Classification Pipeline

With this approach we try to split up the multinomial classification task and use two distinct models for binary classification. A first model is trained to detect if the image belongs to the class of bad image quality. If that is not the case, a second model should detect the class of the image (in this case car or truck). This might turn out beneficial, as each model could be specialized in the respective task. In this approach both models are of the same structure as the baseline model, i.e. a ResNet50 model is used that is already familiar with classes for cars and trucks.

🤗-Transformers Models

“Hugging Face” Transformers is an API that provides state-of-the-art pre-trained models for machine learning as well as a very user-friendly way of implementation. Once the data is in an appropriate folder structure using only one code chunk to specify all necessary model parameters, the API downloads, trains and saves the classification model automatically. In the following we will test the three most liked image classification models on the API: a ViT from Google, a DeiT from Facebook and a DiT from Microsoft. The vision transformer (ViT) from Google is the currently by far most downloaded pre-trained image classification model on the Hugging Face (HF) plattform, being pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k. This model type is in contrast to the above mentioned approaches, as the ResNet50 model is a Convolutional Neural Network (like commonly used for these tasks). Vision transformers, however, might have the capability to overtake CNNs in computer vision tasks in the near future, as several sources speculate and show [1].

Results

Below you can see the classification performance of the different models on a yet unseen test set. As we trained all models on Google Colab, training time might not be fully comparable as processing power depends on current availability.

Overall, the evaluation metrics are very high, which is probably partly due to the fact that the models were already pre-trained to detect several types of cars and trucks. Nonetheless, it turns out that the HF models show best performance not only throughout all evaluation metrics but also in a qualitative analysis. Here, we manually check all mispredicted images to see if the model failed miserably or the wrong prediction was due to an ambiguous test image. Taking this into account, especially the performance of the pre-trained Google ViT Model from HF is outstanding as it made almost no obvious mistakes (also check the confusion matrix of this model on the right). However, training two separate models for each classification step (“stepwise approach”) turned out to worsen performance compared to the baseline.

As the HF ViT model from Google performs best and is also very simple to implement, we will now test the respective training pipeline on datasets of other domains. At first, we try out a dataset with images of bad quality as well as images containing garbage bins or not. Checking the results we can see that the newly trained model also works very well for this dataset, achieving an accuracy of close to 90%. Using other datasets from different domains and with multiple classes, we consistently reach accuracy measures above 90% as you can see in the table below.

Validation of Google ViT model

As the HF Google ViT model performed worse for the garbage bin dataset, we now wanna check if this model nonetheless is superior to the other algorithms. We thus train the models from before once again but now on this garbage bin dataset and compare the results. Here we can see that actually the DeiT model from Facebook performed best with an accuracy of almost 88%. Though, the ViT from Google is nearly as good with an accuracy of 86%. Furthermore, please keep in mind that as the test dataset only contains 108 images, the values should not be compared too strictly.

Conclusion

In the end it is stunning to see how straightforward it is to set up and use a HF Transformers pipeline for image classification while also obtaining excellent performance. Initially used on the classification problem of trucks and cars, we could achieve very high accuracy values by using the very popular HF ViT from Google. This might also be due to the fact that the models were already highly pre-trained for these two classes on ImageNet data and that the training as well as test data was very clean. In addition, it turned out that this pipeline can very easily be fed with new data from other domains while maintaining very good performance. However, depending on the concrete image classification problem, other approaches might perform better.

For sure there are almost endless further possible approaches one could try out. Provided one has appropriately labeled data it might be feasible, for example, to first use a basic classifier to check the image quality and then use an object detection model as a classifier, which tries to find the actual object in the image. Additionally, there are many other pre-trained models and classification techniques one could try out. The vision transformer model from the HF API, however, already sets a very high standard. It promises good performance in different domains while needing little training due to pre-training on the ImageNet dataset. Thus, this pipeline can be used as a first point of call proving to be a fast, simple and accurate image classification technique for a wide range of different use cases.

 

References: 

[1] Cf. https://arxiv.org/pdf/2101.01169.pdf, https://arxiv.org/pdf/2010.11929.pdf

https://becominghuman.ai/transformers-in-vision-e2e87b739feb,

https://towardsdatascience.com/are-transformers-better-than-cnns-at-image-recognition-ced60ccc7c8 

Blog 11/30/22

Part 2: Detecting Truck Parking Lots on Satellite Images

In the previous blog post, we created an already pretty powerful image segmentation model in order to detect the shape of truck parking lots on satellite images. However, we will now try to run the code on new hardware and get even better as well as more robust results.

Blog 11/22/22

Part 1: Detecting Truck Parking Lots on Satellite Images

Real-time truck tracking is crucial in logistics: to enable accurate planning and provide reliable estimation of delivery times, operators build detailed profiles of loading stations, providing expected durations of truck loading and unloading, as well as resting times. Yet, how to derive an exact truck status based on mere GPS signals?

Blog 8/10/23

Machine Learning Pipelines

In this first part, we explain the basics of machine learning pipelines and showcase what they could look like in simple form. Learn about the differences between software development and machine learning as well as which common problems you can tackle with them.

Blog 4/28/23

Creating a Social Media Posts Generator Website with ChatGPT

Using the GPT-3-turbo and DALL-E models in Node.js to create a social post generator for a fictional product can be really helpful. The author uses ChatGPT to create an API that utilizes the openai library for Node.js., a Vue component with an input for the title and message of the post. This article provides step-by-step instructions for setting up the project and includes links to the code repository.

Blog 9/27/22

Creating solutions and projects in VS code

In this post we are going to create a new Solution containing an F# console project and a test project using the dotnet CLI in Visual Studio Code.

Blog 6/27/23

Boosting speed of scikit-learn regression algorithms

The purpose of this blog post is to investigate the performance and prediction speed behavior of popular regression algorithms, i.e. models that predict numerical values based on a set of input variables.

News 1/10/25

A new chapter for catworkx US

catworkx is excited to announce a new chapter in the USA: Nick Howser is our new CEO! With over 13 years of experience in the Atlassian ecosystem and a strong focus on customer success, Nick will continue to drive growth in the US.

CLOUDPILOTS, Google Workspace, G Suite, Google Cloud, GCP, MeisterTask, MindMeister, Freshworks, Freshdesk, Freshsales, Freshservice, Looker, VMware Engine
Produkt

Meet Hardware

In order to host professional video conferences in your company, it is worth making a small investment in high-quality hardware.

Blog 5/5/23

How we discover and organise domains in an existing product

Software companies and consultants like to flex their Domain Driven Design (DDD) muscles by throwing around terms like Domain, Subdomain and Bounded Context. But what lies behind these buzzwords, and how these apply to customers' diverse environments and needs, are often not as clear. As it turns out it takes a collaborative effort between stakeholders and development team(s) over a longer period of time on a regular basis to get them right.

Referenz

Recertification solution of FI-TS

With the support of TIMETOACT GROUP, the IT service provider succeeded in raising the quality of authorization recertification to a new level.

Blog 7/25/23

Revolutionizing the Logistics Industry

As the logistics industry becomes increasingly complex, businesses need innovative solutions to manage the challenges of supply chain management, trucking, and delivery. With competitors investing in cutting-edge research and development, it is vital for companies to stay ahead of the curve and embrace the latest technologies to remain competitive. That is why we introduce the TIMETOACT Logistics Simulator Framework, a revolutionary tool for creating a digital twin of your logistics operation.

Headerbild zur AI Factory for Insurance
Service 7/5/21

AI Factory for Insurance

The AI Factory for Insurance is an innovative organisational model combined with a flexible, modular IT architecture. It is an innovation and implementation factory to systematically develop, train and deploy AI models in digital business processes.

Blog

Responsible AI: A Guide to Ethical AI Development

Responsible AI is a key requirement in the development and use of AI technologies. You can find everything you need to know here!

Headerbild für IBM SPSS
Technologie

IBM SPSS Modeler

IBM SPSS Modeler is a tool that can be used to model and execute tasks, for example in the field of Data Science and Data Mining, via a graphical user interface.

Releasewechsel eines eingesetzten IAM-Tools
Referenz

Release change of a deployed IAM tool

TIMETOACT received the order to carry out a major release change for the IAM tool used and to develop the processes back to the standard of the product as far as possible. At the same time, a change of service provider became necessary, which meant that all components of the IAM had to be moved to a new data center.

Headerbild Data Insights
Service

Data Insights

With Data Insights, we help you step by step with the appropriate architecture to use new technologies and develop a data-driven corporate culture

Blog 9/7/20

Innovation Incubator Round 1

Team experiments with new technologies and collaborative problem-solving: This was our first round of the Innovation Incubator.

Blog 5/23/23

License Plate Detection for Precise Car Distance Estimation

When it comes to advanced driver-assistance systems or self-driving cars, one needs to find a way of estimating the distance to other vehicles on the road.

News 8/28/22

Atlassian publishes largest release for Confluence

Atlassian announces a series of new functions for Confluence. There is also a Confluence Mobile App and - last but not least - Confluence is now also on YouTube.

Referenz 8/8/22

Interdisciplinary collaboration at C&A with Atlassian

To homogenize the toolchain TIMETOACT replaced two independent ticketing systems for C&A Services GmbH &Co. with the Atlassian product range. With this step into the enterprise cloud, the fashion retailer is putting is putting an exclamation mark on cross-departmental and cross-location digital collaboration.

Bleiben Sie mit dem TIMETOACT GROUP Newsletter auf dem Laufenden!