Improving a Long Audio Aligner through Phone- Relatedness Matrices for English, Spanish and Basque
Abstract
A multilingual long audio alignment system is presented in the auto-matic subtitling domain, supporting English, Spanish and Basque. Pre-recorded contents are recognized at phoneme level through language-dependent triphone-based decoders. In addition, the transcripts are phonetically translated using grapheme-to-phoneme transcriptors. An optimized version of Hirschberg’s al-gorithm performs an alignment between both phoneme sequences to find matches. The correctly aligned phonemes and their time-codes obtained in the recognition step are used as the reference to obtain near-perfectly aligned sub-titles. The performance of the alignment algorithm is evaluated using different non-binary scoring matrices based on phone confusion-pairs from each decoder, on phonological similarity and on human perception errors. This system is an evolution of our previous successful system for long audio alignment.
BIB_text
author = {Aitor Álvarez, Pablo Ruiz, Haritz Arzelus},
title = {Improving a Long Audio Aligner through Phone- Relatedness Matrices for English, Spanish and Basque},
pages = {473-480},
volume = {8655},
keywds = {
Long audio alignment, automatic subtitling, phonological similarity matrices, perceptual confusion matrices.
}
abstract = {
A multilingual long audio alignment system is presented in the auto-matic subtitling domain, supporting English, Spanish and Basque. Pre-recorded contents are recognized at phoneme level through language-dependent triphone-based decoders. In addition, the transcripts are phonetically translated using grapheme-to-phoneme transcriptors. An optimized version of Hirschberg’s al-gorithm performs an alignment between both phoneme sequences to find matches. The correctly aligned phonemes and their time-codes obtained in the recognition step are used as the reference to obtain near-perfectly aligned sub-titles. The performance of the alignment algorithm is evaluated using different non-binary scoring matrices based on phone confusion-pairs from each decoder, on phonological similarity and on human perception errors. This system is an evolution of our previous successful system for long audio alignment.
}
isbn = {978-3-319-10815-5},
date = {2014-09-12},
year = {2014},
}