Auto-regressive Generative Adversarial Network for Singing Synthesis and Evaluation

Research Overview

One of the main goals of this project is to conduct a fundamental research on mathematical modeling regarding encoding and generation of sequential data or time series data in auto-regressive manner. The model would be based on generative adversarial networks(GAN) which reflects a complementary structure composed of a generator, which generates data, and a discriminator, which feeds back to the generator with evaluated cost. This model can be broadly applied to generation, recognition, and quality correction of medium such as text, audio, or video.

Figure 1. Concept of the singing synthesis model

Other major goal is to build an applied system that automatically synthesize singing human voices. The system will contain a singer model represented as a generator and a professional listener model represented as a discriminator. The project aims to generate natural singing voices using state-of-the-art sample-unit voice synthesizing method based on recent deep neural network techniques. The main functions of the system would not only be synthesizing singing voice with various timbres, but also quantitatively assessing singing proficiency and transcribing music.

Figure 2. Auto-regressive generative adversarial network system overview