Voice Conversion Tutorial

Created

Type

speech synthesis

Tutor-

•

tutorial presented at XVoice workshop May 19th^{\text{th}}th, 2023

Table of Contents

Introduction

In this tutorial, we will explore the basics of voice conversion, its various applications, and the underlying concepts and techniques that make it possible. We will delve into the background and basics, and then discuss the concept of any-to-any voice conversion. Finally, we will examine voice conversion with electromyography (EMG) signals and EMG-to-speech using disentangled representations.

What is Voice Conversion?

Voice Conversion in Various Applications

Try converting your own voice!

Background and Basics

Key Features of Speech Signals

Text-to-speech (TTS) System

Content Encoder

Speaker Encoder

Any-to-any Voice Conversion with Unparallel Data

This tutorial focuses on any-to-any voice conversion since it’s the most demanding case for applications. We will mainly be assuming that the training data is unparallel, meaning that speakers from the training data may not all have the same spoken contents, considering the scarcity of paired data.

Disentanglement Approach for Voice Conversion

Preventing Unwanted Information Leakage

Evaluation on Voice Conversion Systems

Voice Conversion with Electromyography (EMG) Signals

EMG-to-speech with Disentangled Representation

Reproducing

Synthesized on EMG (Gaddy, et al)

Synthesized on EMG + Any-to-one VC model

Synthesized on EMG + Any-to-any VC model

•

Sample 1

Source text : “he read and reread the paper fearing the worst had happened to me”