Saurabh Kumar - Researcher in Speech Processing and Machine Learning

Saurabh Kumar

Welcome to my personal webpage! I am a Research Associate at the SPIRE Lab, Indian Institute of Science (IISc), Bangalore, where I work under the guidance of Prof. Prasanta Kumar Ghosh. My research focuses on automatic speech recognition (ASR) and Dialect Identification (DID) for low-resource Indian languages. I explore tasks such as dialect and domain identification to enhance ASR models and develop efficient pipelines for data curation and quality assessment.

I hold a B.Tech. in Electronics and Communication Engineering from the National Institute of Technology (NIT), Patna. My capstone project involved simulating hybrid plasmonic waveguide-based devices using COMSOL Multiphysics.

Research Highlights

  • ASR Development: Developing ASR systems for agriculture and finance domains, focusing on dialectal variations in Indian languages.
  • Speech Synthesis: Contributed to building a text-to-speech system for nine Indian languages as part of the SYSPIN project.
  • Data Curation: Led efforts to curate extensive multilingual speech and text corpora for the VAANI project, funded by Google.

Skills

I am proficient in machine learning toolkits like PyTorch and Scikit-learn, and ASR frameworks such as ESPnet, Kaldi, and Fairseq. I also have experience with programming languages like Python, Bash, and PHP.

Publications and Projects

I have co-authored several papers on topics ranging from robust children’s speech recognition to dialectal speech processing in Indian languages. Additionally, I have actively contributed to challenges like the LIMMITS’24, MADASR, and Gram Vaani ASR, showcasing my expertise in curating datasets and building baseline models.

Feel free to explore my GitHub, Google Scholar, and LinkedIn profiles to learn more about my work.