automatic speech processing - latin american spanish varieties - consonant lenition - multi-dialectal pronunciation modeling - dialect-specific extended dataset

From large-scale phonetic studies to speech recognition of Spanish varieties

Résumé
Dialectal variation represents a major challenge for automatic speech procesing. The purpose of this research is to improve the performance of a broadcast news transcription system for Latin American Spanish. Automatic speech processing tools were employed to estimate the impact of intervocalic /b/ /d/ /g/ and coda /s/ lenition across Spanish dialects. These findings have been applied to the acoustic model training together with modifications of both the phonemic inventory and lexicon. The effect of dialect-specific extended train data was also studied. Two acoustic model training configurations were developed: an initial set with Peninsular data exclusively and an extended dataset adding Latin American data. The best performing model for Latin American speech includes expert corrections, consonant merge and lenition with the extended dataset. This model obtains 7% relative gain in WER for Latin American data and remains robust to other Spanish dialects.