Search

Modeling Auditory Perception and Speech Development

태그
Speech Processing
2 more properties
1) Visually Grounded Speech Representation Learning
Objective: Self-supervised speech representation learning without text labels through audio-image pair data
Idea from language learning of human
Dataset: SpokenCOCO, PlacesAudio
Baseline model: ResDAVEnet-VQ
Evaluation: Audio-Image retrieval(Recall @ k), zero-speech challenge metrics
The numbers in below example mean similarity score
Gallery view
Search
Retrieval example
Topics
Architecture for hierarchical discrete unit discovery
Loss between audio feature sequences and image feature map
Language model using both word-level units and phonetic-level units
2) Speech Generation without Reconstruction Loss
Objective: Modeling speech development of human using physical vocal tract model
Dataset: LibriSpeech, LJ Speech, Google Speech Commands
Physical vocal tract model: VocalTractLab, gnuspeech
Evaluation: WER/PER using pretrained ASR model
Topics
Articulatory parameters estimation from speech
Voice conversion by controlling stationary articulatory parameters
Concept-to-speech