Parler TTS is an advanced, lightweight text-to-speech model designed to generate high-quality, natural-sounding speech that mirrors the style of a specified speaker. Trained on 45,000 hours of narrated English audiobooks, it offers speaker consistency across generations with 34 characterized speakers that can be specified by name.
Key Features and Functionality:
- High-Fidelity Speech: Produces remarkably natural-sounding speech with exceptional audio quality and clarity.
- Speaker Consistency: Maintains consistent speaker characteristics across multiple generations using 34 predefined speakers.
- Controllable Features: Allows users to control gender, background noise, speaking rate, pitch, and reverberation through simple text prompts.
- Optimized Inference: Supports SDPA, torch.compile, batching, and streaming for faster generation.
- Fully Open-Source: All datasets, pre-processing, training code, and weights are publicly released under the Apache 2.0 license.
- Fine-Tuning Support: Provides comprehensive documentation for training and fine-tuning custom Parler TTS models.
Primary Value and User Solutions:
Parler TTS addresses the need for high-quality, customizable text-to-speech solutions by offering a model that delivers natural-sounding speech with consistent speaker characteristics. Its open-source nature empowers developers and researchers to build upon and tailor the model to specific applications, enhancing accessibility and user engagement across various platforms.