Abstract
This work investigates a deep learning model for generating Bach-style four-part chorale harmonization for given melody. The goal is to generate harmonically improved harmony voices by applying a model architecture capable of capturing both horizontal melodic progressions and vertical harmonic structures. The model architecture is based on an LSTM-based encoder-decoder structure, where each voice part is generated autoregressively for each frame. An encoder structure is proposed to model the harmonic structure formed by the voice parts. And sequential generation is performed based on the generated results of each voice part to learn the relationships between voices. Furthermore, the proposed model is extended to predict chords before generating voices, allowing for the generation of harmonic chorale even without explicit chord condition. The proposed model demonstrated improved performance in terms of token prediction error rate and the frequency of parallel fifth/octave occurrences compared to the baseline model. Furthermore, the chord prediction model exhibited more harmonically generated results compared to applying the existing model architecture.