Celebrating Homai - Using AI for Good

9/3/25

Our colleague Aigiz Kunafin has achieved an outstanding milestone - importance of his side-project Homai was acknowledged by the “AI for Good” Initiative of United Nations. It won the final round of AIFG Innovation Incubator on Malta which was run by Mellifera under umbrella of International Telecommunication Union (ITO).

AI for Good is an increasingly important initiative. It promotes AI projects that help to make the world a better place by focusing on Sustainable Development Goals of UN.

Homai uses AI to address four of these goals

In this blog post we will talk about two things:

What is Homai and how it uses AI to make the world a better place?
How is this project implemented behind the scenes? What do RAG, LLMs and Fine-tuning have to do with it?

What is Homai?

Homai is a platform designed to preserve endangered languages and cultures using AI.

Homai started simply. Aigiz is a native speaker of Bashkir, a Turkic language with fewer than 750,000 native speakers and one among many endangered languages worldwide.

Aigiz aimed to preserve the sound of his language and beauty of rich culture, protecting it from extinction. He wanted his children to have the chance to hear and practice Bashkir every day.

Along the way, he sought to provide all Bashkir children with the opportunity to hear fairy tales, songs, and converse in their native tongue.

Ultimately, Homai expanded its mission: offering the same opportunity - a chance for cultural survival in the digital age - to all children and endangered languages across the globe.

This might sound impossible. Thankfully, modern technology - AI - provides the necessary tools to make this vision a reality. Interestingly, this is the same "AI" that some view as dangerous or threatening.

Here's how Homai works:

At its core is a smart speaker. This is how it appears when placed in schools and kindergartens:

Children can talk to this smart speaker in their native language, asking questions, listening to songs and fairy tales in their native language.

Obviously, when asked a cultural question, this smart speaker won't simply rely on ChatGPT (OpenAI knows nothing about local villages or poets) or attempt to find a non-existent Wikipedia page. Instead, it will leverage its own curated knowledge base to provide relevant facts or tell a fairy tale.

Saving one language is not nearly enough. On average, a language disappears every two weeks. Linguists estimate that nearly half of the world's approximately 7,000 languages could become extinct within the next century.

To address this, Aigiz and his team are creating a comprehensive platform designed to digitize and preserve multiple languages simultaneously. This ecosystem aims to support low-resource languages by providing:

A cost-effective and robust smart speaker device.
Smart speaker backend infrastructure capable of hosting various languages and cultural content.
A streamlined onboarding process for new languages onto this platform.
A dedicated system for capturing and organizing custom cultural knowledge.
Dataset collection toolkit - tools, software, processes and guidance
Assistance for training or tuning Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Large Language Models (LLMs).

Digitizing even a single language demands considerable effort but is now achievable by small, passionate teams - something that previously required extensive resources from large linguistic institutions. Modern technologies, artificial intelligence, and powerful foundational models have made this possible.

This process can move forward with multiple languages in parallel, and so it does:

Homai integrates all these elements together - from the smart speaker design to empowering linguistic communities to digitize their own language and culture, making them accessible to children at home and in kindergartens.

How does Homai work behind the scenes?

You have seen a nice speaker photo above.

Aigiz and his team of enthusiasts remember this smart speaker from a different perspectives throughout the years of evolution:

The smart speaker device is crucial for adoption - it has to be cost-effective, powerful and sturdy enough. Only then it could be placed in many families and kindergartens for the children to talk to.

Under the hood, it's essentially a simple PCB powered by an ESP32-S3 microcontroller, complete with a microphone, a few diodes, and buttons. The ESP32-S3 is an inexpensive, compact microcontroller typically favored by hobbyists for creating remote temperature sensors and IoT automation.

It has only a tiny amount of memory (just 512 KB of RAM for both data and instructions, 8 MB of PSRAM and 16MB of storage) and only two cores.

Despite these limitations, Aigiz and his team have successfully transformed this into a functional smart Wi-Fi-connected speaker. This device runs:

A small, specialized wake-word detection model (a machine learning model trained to detect a specific word or phrase).
An audio processing pipeline.
A streaming client to maintain continuous communication with the “brains” of the system.

The software itself is native C++ code developed using the ESP-IDF framework, with firmware updates deployed over-the-air (OTA).

Homai Server is a more traditional application. At the heart of it is a an event-driven coordination server written in golang. It maintains connections with all the devices via web sockets and manages their state through various stages like: “listening”, “running ASR” or “sending response”.

This backend is also responsible for resiliency in face of failure, authentication, and managing jobs for the machine learning models and external services.

If the backend is a heart, then GPU servers running custom LLM models are the muscles.

There are custom fine-tuned machine learning models per language. Speech recognition is currently based on Wav2Vec2.0 Bert, while text-to-speech runs on VITS (conditional variational autoencoder with adversarial learning). However, these architectures are a current implementation detail. They can change rapidly, tracking current state of the art in linguistics.

These GPU servers run python-based agents. They continuously pull jobs from the backend server, run them through the GPU pipelines and push results back to the server. GPU servers can be run in parallel, to provide redundancy and load balancing.

There also is a special type of the server that is responsible for the cultural intelligence and overall conversations - agentic. This server is also written in Python. It keeps track of conversations, detects intents and integrates with third party information sources.

For example, when Homai is used in classes, teachers would frequently prepare class notes and exercises and upload them to their profile via a special website for the pedagogues. Then associated Homai device would be able to refer to the data and exercises, when teachers mention these during the class. Agentic server implements all the required functionality for that, along with managing other language-specific bits of knowledge:

- fairy tales and songs
- skills and integrations
- cultural knowledge base

As you could’ve already guessed, this part is implemented as a specialised advanced RAG system. It uses patterns and practices similar to the ones discussed in Enterprise RAG Challenge (How I Won the Enterprise RAG Challenge).

Technology-wise, the project uses Nix and NixOS to manage multiple different deployments (and deployment stages) and connect them via a private secure network. In addition to the servers mentioned above, there also are component for observability and logging, SSL termination, serving content and APIs, managing firmware updates.

This would not be possible without modern AI

AI gets no credit on this team slide

Advancements in AI, particularly through open-source research and the release of powerful, multimodal language models, have made it technically viable to capture speech and preserve cultural heritage effectively. Collaborative efforts from linguists worldwide have further lowered the barriers to training custom speech recognition and text-to-speech models, enabling even individual researchers to accomplish this.

However, speech recognition and generation alone are insufficient for cultural preservation; a smart assistant requires intelligence and cultural insight. Recent breakthroughs in LLMs, advanced RAG, and reasoning architectures helped here.

LLM Benchmarks , Enerprise RAG Challenge and our insights in AI Cases contributed to this progress. They helped to make efficient design decisions and, in turn, have drawn inspiration from the successes of Homai. The power of international collaboration between talented teams became a source of inspiration and motivation for “AI Strategy & Research Hub” at TIMETOACT GROUP Austria. Its purpose is to coordinate practical AI R&D in the community and push forward State-of-the-Art together.

The process of language preservation is a community effort. It has to be structured and organised into a repeatable process. AI helped here as well. AI coding was used to quickly build numerous tools and interfaces. Including volunteer-oriented websites for compiling cultural knowledge and chatbots designed for recording and verifying audio samples. Anthropic Claude emerged as the most frequently utilized AI tool in these workflows, along with a dash of ChatGPT o1 pro for the most challenging tasks.

LLM-driven processes also helped with dataset preparation and cleanup at scale.

To amplify the human effort even further, AI was used to design and develop the most complex parts of system: backend orchestrator in go, python agents and the smart speaker firmware.

Long story short, AI was pretty helpful. It allowed to achieve a lot.

At TIMETOACT GROUP Austria, we are taking these lessons to heart in our “Embrace AI” initiative - helping to spread patterns and practices of AI in coding among all the peers, while providing them with all the required tools and support.

AI, just like any tool, is only as good as the person who handles it. The Homai project was made possible because of Aigiz's years of dedicated work, continuous learning, and passion for creatively applying AI to the problems worth solving. We are honored (and also are in a bit of awe) to follow our talented colleague on this amazing journey and learn from him.

AI For Good

These days, we often hear concerns about AI’s disruptive impact on society and modern culture. Hopefully, this blog post has provided you with a different, more optimistic perspective - one where AI is indispensable for preserving disappearing languages and cultures.

If you have a worthy problem where AI can also be used to make the world a better place at scale - don’t hesitate to reach us at TIMETOACT GROUP Austria. We’d love to hear more from you!

Blog

AI for social good

Discover how leading companies are already profiting from Gen AI!

Blog 3/17/22

Using NLP libraries for post-processing

Learn how to analyse sticky notes in miro from event stormings and how this analysis can be carried out with the help of the spaCy library.

Workshop

AI Workshops for Companies

Whether it's the basics of AI, prompt engineering, or potential scouting: our diverse AI workshop offerings provide the right content for every need.

Service 7/5/21

AI Factory for Insurance

The AI Factory for Insurance is an innovative organisational model combined with a flexible, modular IT architecture. It is an innovation and implementation factory to systematically develop, train and deploy AI models in digital business processes.

Blog 6/24/21

Using a Skill/Will matrix for personal career development

Discover how a Skill/Will Matrix helps employees identify strengths and areas for growth, boosting personal and professional development.

Blog 5/17/24

8 tips for developing AI assistants

8 practical tips for implementing AI assistants

Blog 5/17/24

8 tips for developing AI assistants

8 practical tips for implementing AI assistants

Blog 12/3/21

Using Discriminated Union Labelled Fields

A few weeks ago, I re-discovered labelled fields in discriminated unions. Despite the fact that they look like tuples, they are not.

Blog

Functional Validation in F# Using Applicatives

Learn how to implement functional validation in F# using applicatives. Handle multiple errors elegantly with Railway-Oriented Programming and concise functional patterns.

Blog

[eBook] Building The Foundation For AI Success

Discover AI hypercomputers: AI-optimized hardware, software and smart energy use - for maximum efficiency and productivity of the next generation.

Blog

Using GCP Cloud Functions with F#

Learn how to build and test Google Cloud Functions in F#, using dependency injection, configuration, and pub/sub messaging for real-world cloud apps.

Branche

AI & Digitization for the Transportation and Logistics Indus

Digitale, transparente Prozesse und automatisierte Optimierung helfen Logistikunternehmen, Kosten und Leistung besser auszubalancieren, für eine starke, zukunftsfähige Rolle als Partner der Wirtschaft