Skip to content

Category: Conferences

“The LOCATA Challenge Data Corpus for Acoustic Source Localization and Tracking“

Access:

Proceedings IEEE Sensor Array and Multichannel (SAM) Signal Processing Workshop 2018

Authors:

Heinrich W. Löllmann, Christine Evers, Alexander Schmidt, Heinrich Mellmann, Hendrik Barfuss, Patrick A. Naylor, and Walter Kellermann

Abstract:

Algorithms for acoustic source localization and tracking are essential for a wide range of applications such as personal assistants, smart homes, tele-conferencing systems, hearing aids, or autonomous systems. Numerous algorithms have been proposed for this purpose which, however, are not evaluated and compared against each other by using a common database so far. The IEEE-AASP Challenge on sound source localization and tracking (LOCATA) provides a novel, comprehensive data corpus for the objective benchmarking of state-of-the-art algorithms on sound source localization and tracking. The data corpus comprises six tasks ranging from the localization of a single static sound source with a static microphone array to the tracking of multiple moving speakers with a moving microphone array. It contains real-world multichannel audio recordings, obtained by hearing aids, microphones integrated in a robot head, a planar and a spherical microphone array in an enclosed acoustic environment, as well as positional information about the involved arrays and sound sources represented by moving human talkers or static loudspeakers.

“Sparse parametric modeling of the early part of acoustic impulse responses”

IEEE Xplore Access:

Proc. European Signal Processing Conference (EUSIPCO)

Authors: 

Constantinos Papayannis, Christine Evers and Patrick A. Naylor

Abstract: 

Acoustic channels are typically described by their Acoustic Impulse Response (AIR) as a Moving Average (MA) process. Such AIRs are often considered in terms of their early and late parts, describing discrete reflections and the diffuse reverberation tail respectively. We propose an approach for constructing a sparse parametric model for the early part. The model aims at reducing the number of parameters needed to represent it and subsequently reconstruct from the representation the MA coefficients that describe it. It consists of a representation of the reflections arriving at the receiver as delayed copies of an excitation signal. The Time-Of-Arrivals of reflections are not restricted to integer sample instances and a dynamically estimated model for the excitation sound is used. We also present a corresponding parameter estimation method, which is based on regularized-regression and nonlinear optimization. The proposed method also serves as an analysis tool, since estimated parameters can be used for the estimation of room geometry, the mixing time and other channel properties. Experiments involving simulated and measured AIRs are presented, in which the AIR coefficient reconstruction-error energy does not exceed 11.4% of the energy of the original AIR coefficients. The results also indicate dimensionality reduction figures exceeding 90% when compared to a MA process representation.

“Discriminative feature domains for reverberant acoustic environments”

IEEE Xplore Access:

in Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)

Authors: 

C. Papayiannis, C. Evers, and P. A. Naylor

Abstract:

Several speech processing and audio data-mining applications rely on a description of the acoustic environment as a feature vector for classification. The discriminative properties of the feature domain play a crucial role in the effectiveness of these methods. In this work, we consider three environment identification tasks and the task of acoustic model selection for speech recognition. A set of acoustic parameters and Machine Learning algorithms for feature selection are used and an analysis is performed on the resulting feature domains for each task. In our experiments, a classification accuracy of 100% is achieved for the majority of tasks and the Word Error Rate is reduced by 20.73 percentage points for Automatic Speech Recognition when using the resulting domains. Experimental results indicate a significant dissimilarity in the parameter choices for the composition of the domains, which highlights the importance of the feature selection process for individual applications.

“Source tracking using moving microphone arrays for robot audition”

IEEE Xplore Access:

Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)

Authors:

C. Evers, Y. Dorfan, S. Gannot, and P. A. Naylor

Abstract:

Intuitive spoken dialogues are a prerequisite for human-robot interaction. In many practical situations, robots must be able to identify and focus on sources of interest in the presence of interfering speakers. Techniques such as spatial filtering and blind source separation are therefore often used, but rely on accurate knowledge of the source location. In practice, sound emitted in enclosed environments is subject to reverberation and noise. Hence, sound source localization must be robust to both diffuse noise due to late reverberation, as well as spurious detections due to early reflections. For improved robustness against reverberation, this paper proposes a novel approach for sound source tracking that constructively exploits the spatial diversity of a microphone array installed in a moving robot. In previous work, we developed speaker localization approaches using expectation-maximization (EM) approaches and using Bayesian approaches. In this paper we propose to combine the EM and Bayesian approach in one framework for improved robustness against reverberation and noise.

“Audio-visual tracking by density approximation in a sequential Bayesian filtering framework”

IEEE Xplore Access:

Proc. Joint Workshop on Hands-Free Speech Communications and Microphone Arrays (HSCMA)

Authors:

Israel D. Gebru, Christine Evers, Patrick A. Naylor, Radu Horaud

Abstract:

This paper proposes a novel audio-visual tracking approach that exploits constructively audio and visual modalities in order to estimate trajectories of multiple people in a joint state space. The tracking problem is modeled using a sequential Bayesian filtering framework. Within this framework, we propose to represent the posterior density with a Gaussian Mixture Model (GMM). To ensure that a GMM representation can be retained sequentially over time, the predictive density is approximated by a GMM using the Unscented Transform. While a density interpolation technique is introduced to obtain a continuous representation of the observation likelihood, which is also a GMM. Furthermore, to prevent the number of mixtures from growing exponentially over time, a density approximation based on the Expectation Maximization (EM) algorithm is applied, resulting in a compact GMM representation of the posterior density. Recordings using a camcorder and microphone array are used to evaluate the proposed approach, demonstrating significant improvements in tracking performance of the proposed audio-visual approach compared to two benchmark visual trackers.

“Speaker tracking in reverberant environments using multiple directions of arrival”

IEEE Xplore Access:

in Proc. Hands-free Speech Communications and Microphone Arrays (HSCMA)

Authors:

Christine Evers, Boaz Rafaely, and Patrick A. Naylor

Abstract:

Accurate estimation of the Direction of Arrival (DOA) of a sound source is an important prerequisite for a wide range of acoustic signal processing applications. However, in enclosed environments, early reflections and late reverberation often lead to localization errors. Recent work demonstrated that improved robustness against reverberation can be achieved by clustering only the DOAs from direct-path bins in the short-term Fourier transform of a speech signal of several seconds duration from a static talker. Nevertheless, for moving talkers, short blocks of at most several hundred milliseconds are required to capture the spatio-temporal variation of the source direction. Processing of short blocks of data in reverberant environment can lead to clusters whose centroids correspond to spurious DOAs away from the source direction. We therefore propose in this paper a novel multi-detection source tracking approach that estimates the smoothed trajectory of the source DOAs. Results for realistic room simulations validate the proposed approach and demonstrate significant improvements in estimation accuracy compared to single-detection tracking.

“Localization of moving microphone arrays from moving sound sources for robot audition”

IEEEXplore Access:

in Proc. European Signal Processing Conf. (EUSIPCO), Budapest, Hungary, Aug. 2016

Authors:

C. Evers , A. H. Moore, and P. A. Naylor

Abstract:

Acoustic Simultaneous Localization and Mapping (a-SLAM) jointly localizes the trajectory of a microphone array installed on a moving platform, whilst estimating the acoustic map of surrounding sound sources, such as human speakers. Whilst traditional approaches for SLAM in the vision and optical research literature rely on the assumption that the surrounding map features are static, in the acoustic case the positions of talkers are usually time-varying due to head rotations and body movements. This paper demonstrates that tracking of moving sources can be incorporated in a-SLAM by modelling the acoustic map as a Random Finite Set (RFS) of multiple sources and explicitly imposing models of the source dynamics. The proposed approach is verified and its performance evaluated for realistic simulated data.