Each entry in the dataset consists of a unique MP3 and corresponding text file. Many of the 1,368 recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help train the accuracy of speech recognition engines. The dataset currently consists of 1,087 validated hours in 18 languages, but we're always adding more voices and languages.