Search

Transformation of Head-Related Transfer Functions (HRTF)

태그
Signal Processing
2 more properties

Robust interpolation by leveraging hyper-conditioning techniques

Global HRTF Interpolation via Learned Affine Transformation of Hyper-conditioned Features (under review)

Authors
Jin Woo Lee, Sungho Lee, and Kyogu Lee
Abstract
Background
What is HRTF?
Methods to obtain HRTF, and related issues
Proposed method
Motivation
We proposed a method to resolve the problem of incorporating HRTF datasets of different coordinate systems. The proposed method predicts an HRTF of the desired position from neighboring known HRTFs with their corresponding positions and anthropometric measurements.
Using position-aware interpolation within a local region of interest, our model is more precise and robust in the difference of coordinate systems compared to linear interpolation. We also showed that our conditioning method advantage generalization under limited numbers of anthropometric and positional data.
Method
Our network is built upon Feature-wise Linear Modulation (FiLM), an architecture designed for general-purpose conditioning. First, we used a pointwise convolution block that interpolate input HRTFs by linear combination with learned coefficients.
Yet, the number of anthropometric data within an HRTF dataset is equal to the number of subjects, it is comparably smaller than the number of coordinates, or that of the HRTFs.
Since the number of subjects are often insufficient for data-driven methods, the neural networks are likely to be overfitted to the anthropometric data. Therefore, other than a naive concatenation, an additional conditioning method was employed.
The architecture modulates the conditions for FiLM Layer using a hyper-convolution layer (HyperConv). The HyperConv convolve its inputs using weight and bias tensors calculated from its conditions.
Conditioned by the anthropometric measurements and the target position, the HyperConv modulates the relative positions of the input HRTFs to generate conditions for the FiLM Layer.
We hypothesize that this conditioning framework advantages generalized prediction of HRTFs
Ablation Study
We conducted an ablation study on our best model to understand how the network learns interpolation by conditioning the physical features. Using the proposed conditioning method advantaged the most accurate interpolation of HRTFs. We empirically observed that hyper-convolution advantages conditioning the physical features in predicting HRTFs.
Results
Coclusion
In this work, we introduced a neural network that interpolates given HRTFs by conditioning source positions and anthropometric measurements.
We explored the use of two conditioning methods, FiLM and hyper-convolution, for our network that estimate the HRTF of the desired position for given HRTFs sampled from its neighborhood.
Quantitative analysis showed that our method precisely interpolates the HRTFs, outperforming the baseline.
The ablation experiment showed that the proposed conditioning method performed better than the rest of the experimented methods.
Also, the model with the proposed conditioning scheme generalized well to the unseen dataset with the unseen grid system. Furthermore, we demonstrated that our network is able to reconstruct the downsampled HRTFs.
We expect future developments in personalized HRTF generation based on the proposed system.

Spatial Interpolation and Abnormal Detection of HRIRs using Neural Networks

Background
In spatial audio, it is known that using personalized head related transfer functions (HRTFs) to render binaural sounds can increase spatial accuracy of the listener compared to using a standardized HRTF dataset. Despite the advantages of using personalized HRTFs, recording HRTFs is a very costly process in terms of time and price. Research to reduce the cost of recording personalized HRTFs can be classified into two groups, research focused on reducing time and research focused on reducing the need of controlled environment.
Previous studies on HRIR interpolation mostly used deterministic interpolation algorithms and recent approaches used machine learning frameworks such as kNN or neural networks.
To the best of our knowledge, no research has yet been made to detect abnormal recordings of HRIR directly from the HRIR data using neural networks.
Proposed method
In this study, we propose to use an interpolation neural network to interpolate and detect abnormal recordings of HRIR. We used HRIR as our input feature which is an inverse Fourier transform of HRTF. In a typical auto-encoding neural network, the network is trained to reproduce its input. In an interpolation neural network, the network is trained to reproduce the data in between input data. The architecture is inspired by the neural network architecture suggested by Suefusa et el., where they proposed to use it to detect abnormal sounds [1].
Contribution
1.
We propose and evaluate an interpolation neural network that can increase the spatial resolution of an HRIR set.
2.
We investigate the possibility of using the proposed interpolation neural network as a model to detect abnormal recordings of HRIRs in an unsupervised manner.
3.
We propose a framework to evaluate abnormal HRIR detection models.
HRIR interpolation samples
Reference
[1] Suefusa, Kaori, et al. "Anomalous sound detection based on interpolation deep neural network." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.