Why this AI startup is betting on voice-enabled bots to scale AI adoption in India

why-this-ai-startup-is-betting-on-voice-enabled-bots-to-scale-ai-adoption-in-india
Why this AI startup is betting on voice-enabled bots to scale AI adoption in India

If your target market has 22 official languages and its people speak in over 19,000 dialects, does it make sense to offer a text-only AI chatbot that can function best in a couple languages?

That’s the question Indian AI startup Sarvam has been working to solve, and on Tuesday it launched a series of offerings, including a voice-enabled AI bot that supports more than 10 Indian languages, betting that people in the country would prefer to talk to an AI model in their own language rather than chat with it over text. The startup is also launching a small language model, an AI tool for lawyers, as well as an audio-language model.

“People prefer to speak in their own language. It’s extremely challenging to type in Indian languages today,” Vivek Raghavan, co-founder of Sarvam AI, told TechCrunch.

The Bengaluru-based startup, which primarily targets businesses and enterprises, is pitching its AI voice-enabled bots for a number of industries, particularly those relying on customer support. As an example, it pointed to one of its customers: Sri Mandir, a startup that offers religious content, has been using Sarvam’s AI agent to accept payments, and has processed more than 270,000 transactions so far.

The company said its AI voice agents can be deployed on WhatsApp, within an app, and can even work with traditional voice calls.

Backed by Peak XV and Lightspeed, Sarvam plans to price its AI agents starting at ₹1 (approximately 1 cent) per minute of usage.

Image Credits: Sarvam

The startup is building its voice-enabled AI agents on top of a foundational, small language model, called Sarvam 2B, that’s trained on a data set of 4 trillion tokens. The model is completely trained on synthetic data, according to Raghavan.

AI experts often advise caution when using synthetic data — essentially data generated by a large language model that aims to replicate real-world data — to train other AI models, because LLMs tend to hallucinate and make up information that may not be accurate. Training AI models on such data may serve to exacerbate such inaccuracies.

Raghavan said Sarvam opted to use synthetic data due to the extremely limited availability of Indian language content on the open web. The startup has developed models to clean and improve the data first used to generate the synthetic datasets, he added.

The founder claimed that Sarvam 2B will cost a tenth of anything comparable in the industry. The startup is open-sourcing the model, hoping that community will further build upon it.

“While the large language foundational models are very exciting, you can achieve an experience that is superior, more specific, lower-cost and with reduced latency using small language models,” Raghavan said. “If you want to perform a query or two in a week or a month, you should use the large language models. But for use cases requiring millions of daily interactions, I believe smaller models are more suitable.”

The startup is also launching an audio-language model, called Shuka, built on its Saaras v1 audio decoder and Meta’s Llama3-8B Instruct. This model is also being open-sourced, so developers can use the startup’s translation, TTS, and other modules to build voice interfaces.

And, there’s another product dubbed “A1” — a generative AI workbench designed for lawyers that can look up regulations, draft documents, redact them and extract data.

Sarvam is one of the small group of Indian startups advocating for use cases that align with the country’s interests and contribute to the government’s efforts to develop its own bespoke AI infrastructure.

Governments across the world are increasingly pursuing “sovereign AI” – AI infra that’s developed and controlled at the national level. The purported aim of such efforts is to safeguard data privacy, stimulate economic growth and tailor AI development to their cultural contexts. The United States and China currently have the biggest investments in this space, and India is following with its “IndiaAI” program and language-specific models.

One of the initiatives under the IndiaAI program is called IndiaAI Compute Capacity, and the plan is to establish a supercomputer powered by at least 10,000 GPUs. One of the models being developed, dubbed Bhashini, aims to democratize access to digital services across various Indian languages.

Raghavan said his startup is ready to contribute to the IndiaAI program. “If the opportunity arises, we will work with the government,” he said in the interview.