Audio-Language Multi-modal Learning

태그

Multi-modal Learning

2 more properties

Topic 1. Text Query based Audio Retrieval

•

Retrieving audio signals using their sound content textual descriptions (i.e., audio captions).

•

Text query composed of manually written audio captions.

•

For each text query, the goal of this task is to retrieve audio files from a given dataset and sort them based their match with the query.

Topic 2. Automated Audio Captioning

•

The task of general audio content description using free text.

•

An inter-modal translation task (not speech-to-text), where a system accepts as an input an audio signal and outputs the textual description (i.e. the caption) of that signal.

•

Modeling concepts (e.g. "muffled sound"), physical properties of objects and environment (e.g. "the sound of a big car", "people talking in a small and empty room"), and high level knowledge ("a clock rings three times").

Audio-Language Multi-modal Learning

Topic 1. Text Query based Audio Retrieval

Topic 2. Automated Audio Captioning

Research 1. Audio-Text Data Augmentation

Research 2. Audio Captioning

Research 3. Audio-Text Retrieval