This repo provides a command-line tool for performing automatic speech-to-text tasks (i.e., "transcription") using open source models from Hugging Face Hub. For interactive tasks, it allows users to ...
Abstract: Zero-shot text-to-speech (TTS) has recently achieved remarkable performance by leveraging a speech prompt instead of a speaker embedding, as it provides richer information. However, ...
Abstract: We propose a novel generative speech enhancement (SE) framework that integrates a language model (LM) and a flow-matching model. To utilize an LM with discrete tokens, we introduce dMel, ...
Most mobile health apps employ data visualization to help people view their health and activity data, but these apps provide limited support for visual data exploration. Furthermore, despite its huge ...
Voice-based technologies are essential to cater to the hundreds of millions of new smartphone users. However, most of the languages spoken by these new users have little to no labelled speech data.
Forbes contributors publish independent expert analyses and insights. Rachel Wells is a writer who covers leadership, AI, and upskilling. A few days ago, on November 26th, right before Thanksgiving, ...
Small and fast: only 123M parameters. High-quality voice cloning: state-of-the-art performance in speaker similarity, intelligibility, and naturalness. Multi-lingual: support Chinese and English.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results