Based on real benchmark data from our own software products, we re-evaluate each month the performance of different LLM models in addressing specific challenges. We examine specific categories such as document processing, CRM integration, external integration, marketing support, and code generation.
Discover the best large language models for digital products
The monthly TIMETOACT GROUP Language Model (LLM) Benchmarks help you choose the best AI models for digital product development.
LLM Benchmarks | September 2024
September has been exciting! In this edition of TMETOACT GROUP LLM Benchmark we’ll talk about pushing the state of the art.
The Highlights:
- ChatGPT o1 models are the best, but there is a minor caveat.
- Gemini 1.5 Pro v002 - 3rd place in the benchmark.
- Benchmarking Qwen 2.5 and DeepSeek 2.5 - local model catching up to GPT-4 Turbo.
- Llama 3.2 - average performance, but also with a minor caveat.
- Trends of local LLMs over time.
How well can the model work with large documents and knowledge bases?
How well does the model support work with product catalogs and marketplaces?
Can the model easily interact with external APIs, services and plugins?
How well can the model support marketing activities, e.g. brainstorming, idea generation and text generation?
How well can the model reason and draw conclusions in a given context?
Can the model generate code and help with programming?
The estimated cost of running the workload. For cloud-based models, we calculate the cost according to the pricing. For on-premises models, we estimate the cost based on GPU requirements for each model, GPU rental cost, model speed, and operational overhead.
The "Speed" column indicates the estimated speed of the model in requests per second (without batching). The higher the speed, the better.
Archive
Curious about how the scores have evolved? Here you can find all links to previously published leaderboards
Discover our AI workshops for businesses
Whether it's AI fundamentals, Prompt Engineering training, or potential analysis – we offer tailored solutions for every need.
Transform your digital projects with the best AI language models!
Discover the transformative power of the best Large Language Models and revolutionize your business with AI! Stay future-oriented, increase efficiency and secure a clear competitive advantage. We support you in taking your business value to the next level.
Martin Warnung
martin.warnung@timetoact.at