•
tutorial presented at XVoice workshop May 19, 2023
Table of Contents
Introduction
In this tutorial, we will explore the basics of voice conversion, its various applications, and the underlying concepts and techniques that make it possible. We will delve into the background and basics, and then discuss the concept of any-to-any voice conversion. Finally, we will examine voice conversion with electromyography (EMG) signals and EMG-to-speech using disentangled representations.
What is Voice Conversion?
Voice Conversion in Various Applications
Try converting your own voice!
Background and Basics
Key Features of Speech Signals
Text-to-speech (TTS) System
Content Encoder
Speaker Encoder
Any-to-any Voice Conversion with Unparallel Data
This tutorial focuses on any-to-any voice conversion since it’s the most demanding case for applications. We will mainly be assuming that the training data is unparallel, meaning that speakers from the training data may not all have the same spoken contents, considering the scarcity of paired data.
Disentanglement Approach for Voice Conversion
Preventing Unwanted Information Leakage
Evaluation on Voice Conversion Systems
Voice Conversion with Electromyography (EMG) Signals
EMG-to-speech with Disentangled Representation
Reproducing
Synthesized on EMG (Gaddy, et al)
Synthesized on EMG + Any-to-one VC model
Synthesized on EMG + Any-to-any VC model
•
Sample 1
Source text : “he read and reread the paper fearing the worst had happened to me”