IISc-Art Park will make 16,000 hours of speech data available as open source for AI model development.
- -
- Jul 11, 2024
- 1 min read
Updated: Jul 11, 2024

The Indian Institute of Science's AI and Robotics Technology Park (IISc-Art Park) is set to open-source 16,000 hours of spontaneous speech data from 80 districts as part of Project Vaani, in collaboration with Google. This dataset is intended to serve as foundational training data for speech-to-text AI models. Amitabh Nag, CEO of Bhashini, highlighted the project's role in creating digital data for low-resource languages to support AI model development.
Project Vaani aims to provide high-quality, diverse, and anonymized datasets for developing technology solutions that reflect local language usage, according to a Google spokesperson. The initiative is funded by the Bill and Melinda Gates Foundation and SYSPIN (Synthesising Speech in Indian Languages), with the latter starting in 2021 and receiving support from the German Development Corporation (GIZ).
In other news, agritech startup Arya Ag secured $29 million (Rs 242 crore) in a funding round led by Switzerland-based Blue Earth Capital, with participation from Luxembourg-based Asia Impact and Quona Capital. Additionally, Circuit House Technologies, founded by former Xiaomi India CBO Raghu Reddy, raised $4.3 million in a round led by Stellaris Venture Partners and 3one4 Capital.
Comments