Let's build an enterprise AI Assistant

In the previous blog post we have talked about basic principles of building AI assistants. Let’s take them for a spin with a product case that we’ve worked on: using AI to support enterprise sales pipelines.

B2B sales in the enterprise world are very lucrative, once they go through. However, they can take years to settle. This means, that a company needs to have multiple leads in the sales pipeline at once. This is a resource-intensive process that is frequently limited by a human factor; sales become the bottleneck.

It is possible to remove the bottleneck from sales if we could:

  • increase the number of cases a single sales representative can push through at a given time

  • prioritise high-quality leads with a higher success probability 

One project that we have worked on, aims to do exactly that. It is an AI assistant that ingests publicly available information about large companies: Annual Reports, SEC Filings or ESMA documents. This is a large set of data, potentially filled with really good leads.

We just need to sift through that data, finding information about companies and prioritising cases according to our unique sales process. How hard can that be?
 

Exploring hallucinations of AI assistants

As it turns out, classical vector-based AI RAG systems fail even at the most simple sales question: “Which company has the most cash available right now?

In theory, getting this answer from an annual sales report is as easy as looking for the “Cash” entry in the “consolidated balance sheet”.

As you can see in our Enterprise AI Leaderboard section (from the February LLM Benchmark) even the best document retrieval systems sometimes fail to answer questions about a single annual report:

Things get substantially worse, if you would try to upload multiple annual reports and ask “Which company has the most cash”.

Here is an experiment you can reproduce with an AI assistant of your choice:

  • Get annual reports for 2022 from Christian Dior, Bellevue Group and UNIQA Insurance Group (or any other combination for that matter)

  • Upload these to a RAG

  • Ask a very specific question:

You are CFO-GPT. Quickly answer, which of the companies has more liquidity right now? And how much? Don't make up information, if you are not certain.

At this point, it looks like the system should either give us the name of the company with the numeric value OR give up, right?

So what would ChatGPT-4 (still the best in its class) do?

What if we take LlamaIndex which promises to “turn your enterprise data into production-ready LLM applications”?

Answers will be more concise but spectacularly useless:

Ask it a couple of times and it will keep on coming up with creative ideas:

- UNIQA Insurance Group AG has more liquidity right now compared to Bellevue Group AG.
- Bellevue Group has more liquidity right now.
- UNIQA Insurance Group AG has more liquidity right now. The total financial liabilities due within 3 months for UNIQA amount to €12,897 thousand, while Bellevue Group AG has CHF 26,794 thousand due within the same period.

Proponents of vector-based RAG systems will say that nobody uses just LangChain or LlamaIndex and that you should build a dedicated assistant on top of these frameworks first. Although that is a bit of moving the goalposts, you can still independently verify a system like that. Just take a couple of annual reports (the more - the merrier), upload them and ask questions like the ones provided above.

If you think your system will pass such a test, I would be glad to test it personally and share the results publicly. We have 50GB of annual reports to use as test data!

Applying Domain-Driven Design to the problem

There is actually an easy way to build a system capable of answering trivial questions like that. It starts with two simple steps:

  • Throw away the technical complexity and baggage of vector databases

  • Take a deep look at the question being asked.

How would a real domain expert approach this problem? He would probably know how reports are structured and would look for any lines mentioning liquidity or cash flow in the consolidated balance sheets.

We can just replicate this approach. Instead of shredding all documents into tiny pieces and putting them into the vector database, we can extract information from the documents into a knowledge map.

Just like it is depicted at the bottom path in this image:

 We have documents on the left. There are LLM extractors (experts) that go through these documents, find bits of information and populate the knowledge map in advance. The knowledge map is designed in such a way that when the question to an AI assistant comes, answering it becomes a simple task.

Unlike the contents of embedding vectors and graph nodes, this knowledge map will be auditable and easily readable by non-technical people.

If all our system has to do is to talk about liquidity and compare different companies on that topic, then this knowledge map could be represented by a single object that fully fits into the context of a large language model:

  • Bellevue Group

    • 2022 Liquidity: 64,681,000 CHF

    • 2021 Liquidity: 84,363,000 CHF

  • UNIQA Insurance Group AG

    • 2022 Liquidity: 667,675 EUR

    • 2021 Liquidity: 592,583 EUR

  • Christian Dior

    • 2022 Liquidity: 7,588 million EUR

    • 2021 Liquidity: 8,122 million EUR

    • 2020 Liquidity: 20,358 million EUR

 

Larger knowledge maps will need specialised storage systems that can be queried by a LLM during the prompting. It is not uncommon to see knowledge maps with hundreds of thousands of entities.

Fortunately, ChatGPT is quite good with SQL, NoSQL and even pandas. This is one of the reasons why we are tracking the "Code" column in our LLM benchmarks.

Given a knowledge map like this, it feels like cheating to ask ChatGPT questions. Pass it together with the prompt:

You are CFO-GPT. Quickly answer, which of the companies has more liquidity, and how much? Don't make up information, if you are not certain."

The answer will be very precise, and it will not change between runs:

Christian Dior has the highest liquidity in 2022 with 7,588 million EUR.
Blog 7/22/24

So You are Building an AI Assistant?

So you are building an AI assistant for the business? This is a popular topic in the companies these days. Everybody seems to be doing that. While running AI Research in the last months, I have discovered that many companies in the USA and Europe are building some sort of AI assistant these days, mostly around enterprise workflow automation and knowledge bases. There are common patterns in how such projects work most of the time. So let me tell you a story...

Blog 10/4/24

Open-sourcing 4 solutions from the Enterprise RAG Challenge

Our RAG competition is a friendly challenge different AI Assistants competed in answering questions based on the annual reports of public companies.

Blog 5/17/24

8 tips for developing AI assistants

8 practical tips for implementing AI assistants

Blog 5/16/24

Common Mistakes in the Development of AI Assistants

We share how failures when implementing AI occurr and what can be learned from them for future projects: So that AI assistants can be implemented more successfully in the future!

Branche

Artificial Intelligence in Treasury Management

Optimize treasury processes with AI: automated reports, forecasts, and risk management.

Headerbild zu IBM Watson Assistant
Technologie

IBM Watson Assistant

Watson Assistant identifies intention in requests that can be received via multiple channels. Watson Assistant is trained based on real-live requests and can understand the context and intent of the query based on the acting AI. Extensive search queries are routed to Watson Discovery and seamlessly embedded into the search result.

Blog

Google Workspace: AI-supported work for every company

The future of work with Google Workspace and Google AI

Blog

Crisis management & building a sustainable future with AI

Non-profit organizations develop AI models to tackle global challenges - and draw lessons for businesses worldwide

Headerbild zu IBM Spectrum Protect
Technologie 5/20/21

IBM Spectrum Protect

Spectrum Protect is the leading data protection solution from the American industry leader IBM. As TIMETOACT, we take a holistic view of cyber security and focus on building resilient IT and OT.

Kompetenz 7/29/21

AI - A technology is revolutionizing our everyday lives

For ARS, AI is an increasingly natural and organic part of software engineering. This is particularly true in cases where it is an integral part of applications and functions.

Technologie Übersicht

Consulting for IBM products

IBM Software & Consulting with passion and experience – you can rely on us. Services relating to IBM Software Solutions have always been an integral part of our offer.

Headerbild zur AI Factory for Insurance
Service 7/5/21

AI Factory for Insurance

The AI Factory for Insurance is an innovative organisational model combined with a flexible, modular IT architecture. It is an innovation and implementation factory to systematically develop, train and deploy AI models in digital business processes.

Referenz 8/22/23

Managed service support for optimal license management

To ensure software compliance, TIMETOACT supports FUNKE Mediengruppe with a SAM Managed Service for Microsoft, Adobe, Oracle and IBM.

Blog 1/21/25

AI Contest - Enterprise RAG Challenge

TIMETOACT GROUP Austria demonstrates how RAG technologies can revolutionize processes with the Enterprise RAG Challenge.

Blog 12/19/22

Creating a Cross-Domain Capable ML Pipeline

As classifying images into categories is a ubiquitous task occurring in various domains, a need for a machine learning pipeline which can accommodate for new categories is easy to justify. In particular, common general requirements are to filter out low-quality (blurred, low contrast etc.) images, and to speed up the learning of new categories if image quality is sufficient. In this blog post we compare several image classification models from the transfer learning perspective.

Branche

Healthcare

IT and digitalisation are too often seen in Germany as pure cost drivers, without taking into account the revenue opportunities and potentials. We know the challenges in the healthcare sector and help you find the best IT solutions.

Wissen 8/30/24

LLM-Benchmarks August 2024

Instead of our general LLM benchmarks, we present the first benchmark of different AI architectures in August.

Headerbild zur Logistik- und Transportbranche
Branche

AI & Digitization for the Transportation and Logistics Indus

Digitalisierung und Transparenz der Prozesse sowie automatisierte Unterstützung bei der Optimierung können Logistikunternehmen helfen, den Spagat zwischen Kosten und Leistung besser zu bewältigen, um langfristig als wertvoller Partner der Wirtschaft zu agieren.

Blog 4/16/24

The Intersection of AI and Voice Manipulation

The advent of Artificial Intelligence (AI) in text-to-speech (TTS) technologies has revolutionized the way we interact with written content. Natural Readers, standing at the forefront of this innovation, offers a comprehensive suite of features designed to cater to a broad spectrum of needs, from personal leisure to educational support and commercial use. As we delve into the capabilities of Natural Readers, it's crucial to explore both the advantages it brings to the table and the ethical considerations surrounding voice manipulation in TTS technologies.

Blog

Responsible AI: A Guide to Ethical AI Development

Responsible AI is a key requirement in the development and use of AI technologies. You can find everything you need to know here!

Bleiben Sie mit dem TIMETOACT GROUP Newsletter auf dem Laufenden!