Abstract
Automatic composition models based on artificial intelligence (AI) techniques have made it possible to automatically generate musical melodies in the style of a particular artist. Given the issue of copyright, the relationship between interpolated melodies and original reference melodies is important, but the perceptual similarity between them has rarely been examined. In this study, we aimed to investigate the perceptual similarity between interpolated and reference melodies in terms of contour and tonality, two important components of melody perception, by conducting a behavioral experiment. The interpolated melodies were composed at three intermediate levels of the two reference melodies with the intention of making gradual changes at each level of contour or tonality. For the AI composition, we used an explicitly-constrained conditional variational autoencoder (EC2-VAE), which interpolated the latent feature space of the pre-trained model trained with British and American folk songs. In the experiment, the interpolated melodies were randomly presented with the two reference melodies, and 33 participants rated the similarity between the intermediate and reference melodies on a scale of 0–100. The results of the human similarity judgments showed that both interpolations generally reflected intermediate levels of the two reference melodies well. In particular, the AI interpolation better showed gradual changes in tonality between the two references, whereas the human interpolation better reflected gradual contour steps. This result suggests that AI and human composers have different interpolation styles. However, given the limited scope of our experiment with a single composer and AI model, further investigation is needed to include a wider range of human composers and different AI composition models with a diverse set of reference melodies.