Topic 1. Text Query based Audio Retrieval
•
Retrieving audio signals using their sound content textual descriptions (i.e., audio captions).
•
Text query composed of manually written audio captions.
•
For each text query, the goal of this task is to retrieve audio files from a given dataset and sort them based their match with the query.
Topic 2. Automated Audio Captioning
•
The task of general audio content description using free text.
•
An inter-modal translation task (not speech-to-text), where a system accepts as an input an audio signal and outputs the textual description (i.e. the caption) of that signal.
•
Modeling concepts (e.g. "muffled sound"), physical properties of objects and environment (e.g. "the sound of a big car", "people talking in a small and empty room"), and high level knowledge ("a clock rings three times").