Search

Aligning Incomplete Lyrics of Korean Folk Song Dataset using Whisper

Affiliation
MALer
Presenter
한단비내린, 김대웅
Subject
Lyric Alignment
Site
B13
Time
Poster Session II - 13:30~15:00
1 more property

Abstract

We propose a method for time-alignment of lyrics in Korean folk song audio using a transformer encoder-decoder model that internally leverages incomplete lyric information. We conducted an analysis of the characteristics of Korean folk song lyrics and identified some discrepancies between the lyrics and their corresponding audio recordings. To handle these issues while fully exploiting existing transcription, we propose RefWhisper, a variation of Whisper by OpenAI, which has an additional encoder module and cross-attention layer so that the model can refer to an incomplete lyric text while making transcription. The additional cross-attention layer also enables the alignment between the reference text and the predicted transcription and also the audio. Furthermore, we publicly release the transcribed results and timestamp data, aligned at the sentence and word levels, for 14,627 Korean folk songs.