Aligning Incomplete Lyrics of Korean Folk Song Dataset using Whisper

Affiliation

MALer

Presenter

한단비내린, 김대웅

Personal Link

https://danbinaerin.notion.site/Danbinaerin-Han-Researcher-Korean-music-Specialist-eaa8c0e0df6049cba6da08fdf6bcc6af?pvs=4

Subject

Lyric Alignment

Site

B13

Time

Poster Session II - 13:30~15:00

1 more property

maler_한단비내린

Abstract

We propose a method for time-alignment of lyrics in Korean folk song audio using a transformer encoder-decoder model that internally leverages incomplete lyric information. We conducted an analysis of the characteristics of Korean folk song lyrics and identified some discrepancies between the lyrics and their corresponding audio recordings. To handle these issues while fully exploiting existing transcription, we propose RefWhisper, a variation of Whisper by OpenAI, which has an additional encoder module and cross-attention layer so that the model can refer to an incomplete lyric text while making transcription. The additional cross-attention layer also enables the alignment between the reference text and the predicted transcription and also the audio. Furthermore, we publicly release the transcribed results and timestamp data, aligned at the sentence and word levels, for 14,627 Korean folk songs.