Controllable Waveform-Domain Diffusion Model for Event-Guided Foley Sound Synthesis

Affiliation

MAC

Presenter

정윤진

Personal Link

https://yoonjinxd.github.io/

Subject

제네럴 오디오 생성

Site

A15

Time

Poster Session I - 11:10~12:30

1 more property

Abstract

This study addresses the challenge of generating realistic and event-aligned Foley sound effects, which play a crucial role in enhancing the immersive experience of various media forms. We propose a generative audio synthesis system that incorporates sound class category and event timing conditions to generate appropriate waveforms. To preserve temporal information and enhance synchronization with specific events, we introduce Block-FiLM, a block-wise feature linear modulation method. Our approach is demonstrated to significantly improve the quality and alignment of generated sounds by experiments and ablation studies. Evaluation results based on objective metrics and subjective listening tests confirm the effectiveness of our approach. Overall, this work contributes to the advancement of Foley sound synthesis and indicates the potential of generative models for automating and streamlining sound production in various domains.