Skip to content

Christine Evers Posts

“Discriminative feature domains for reverberant acoustic environments”

IEEE Xplore Access:

in Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)

Authors: 

C. Papayiannis, C. Evers, and P. A. Naylor

Abstract:

Several speech processing and audio data-mining applications rely on a description of the acoustic environment as a feature vector for classification. The discriminative properties of the feature domain play a crucial role in the effectiveness of these methods. In this work, we consider three environment identification tasks and the task of acoustic model selection for speech recognition. A set of acoustic parameters and Machine Learning algorithms for feature selection are used and an analysis is performed on the resulting feature domains for each task. In our experiments, a classification accuracy of 100% is achieved for the majority of tasks and the Word Error Rate is reduced by 20.73 percentage points for Automatic Speech Recognition when using the resulting domains. Experimental results indicate a significant dissimilarity in the parameter choices for the composition of the domains, which highlights the importance of the feature selection process for individual applications.

“Source tracking using moving microphone arrays for robot audition”

IEEE Xplore Access:

Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)

Authors:

C. Evers, Y. Dorfan, S. Gannot, and P. A. Naylor

Abstract:

Intuitive spoken dialogues are a prerequisite for human-robot interaction. In many practical situations, robots must be able to identify and focus on sources of interest in the presence of interfering speakers. Techniques such as spatial filtering and blind source separation are therefore often used, but rely on accurate knowledge of the source location. In practice, sound emitted in enclosed environments is subject to reverberation and noise. Hence, sound source localization must be robust to both diffuse noise due to late reverberation, as well as spurious detections due to early reflections. For improved robustness against reverberation, this paper proposes a novel approach for sound source tracking that constructively exploits the spatial diversity of a microphone array installed in a moving robot. In previous work, we developed speaker localization approaches using expectation-maximization (EM) approaches and using Bayesian approaches. In this paper we propose to combine the EM and Bayesian approach in one framework for improved robustness against reverberation and noise.

“Audio-visual tracking by density approximation in a sequential Bayesian filtering framework”

IEEE Xplore Access:

Proc. Joint Workshop on Hands-Free Speech Communications and Microphone Arrays (HSCMA)

Authors:

Israel D. Gebru, Christine Evers, Patrick A. Naylor, Radu Horaud

Abstract:

This paper proposes a novel audio-visual tracking approach that exploits constructively audio and visual modalities in order to estimate trajectories of multiple people in a joint state space. The tracking problem is modeled using a sequential Bayesian filtering framework. Within this framework, we propose to represent the posterior density with a Gaussian Mixture Model (GMM). To ensure that a GMM representation can be retained sequentially over time, the predictive density is approximated by a GMM using the Unscented Transform. While a density interpolation technique is introduced to obtain a continuous representation of the observation likelihood, which is also a GMM. Furthermore, to prevent the number of mixtures from growing exponentially over time, a density approximation based on the Expectation Maximization (EM) algorithm is applied, resulting in a compact GMM representation of the posterior density. Recordings using a camcorder and microphone array are used to evaluate the proposed approach, demonstrating significant improvements in tracking performance of the proposed audio-visual approach compared to two benchmark visual trackers.

“Speaker tracking in reverberant environments using multiple directions of arrival”

IEEE Xplore Access:

in Proc. Hands-free Speech Communications and Microphone Arrays (HSCMA)

Authors:

Christine Evers, Boaz Rafaely, and Patrick A. Naylor

Abstract:

Accurate estimation of the Direction of Arrival (DOA) of a sound source is an important prerequisite for a wide range of acoustic signal processing applications. However, in enclosed environments, early reflections and late reverberation often lead to localization errors. Recent work demonstrated that improved robustness against reverberation can be achieved by clustering only the DOAs from direct-path bins in the short-term Fourier transform of a speech signal of several seconds duration from a static talker. Nevertheless, for moving talkers, short blocks of at most several hundred milliseconds are required to capture the spatio-temporal variation of the source direction. Processing of short blocks of data in reverberant environment can lead to clusters whose centroids correspond to spurious DOAs away from the source direction. We therefore propose in this paper a novel multi-detection source tracking approach that estimates the smoothed trajectory of the source DOAs. Results for realistic room simulations validate the proposed approach and demonstrate significant improvements in estimation accuracy compared to single-detection tracking.

“Localization of moving microphone arrays from moving sound sources for robot audition”

IEEEXplore Access:

in Proc. European Signal Processing Conf. (EUSIPCO), Budapest, Hungary, Aug. 2016

Authors:

C. Evers , A. H. Moore, and P. A. Naylor

Abstract:

Acoustic Simultaneous Localization and Mapping (a-SLAM) jointly localizes the trajectory of a microphone array installed on a moving platform, whilst estimating the acoustic map of surrounding sound sources, such as human speakers. Whilst traditional approaches for SLAM in the vision and optical research literature rely on the assumption that the surrounding map features are static, in the acoustic case the positions of talkers are usually time-varying due to head rotations and body movements. This paper demonstrates that tracking of moving sources can be incorporated in a-SLAM by modelling the acoustic map as a Random Finite Set (RFS) of multiple sources and explicitly imposing models of the source dynamics. The proposed approach is verified and its performance evaluated for realistic simulated data.

“Speaker localization with moving microphone arrays”

IEEEXplore Access:

in Proc. European Signal Processing Conf. (EUSIPCO), Budapest, Hungary, Aug. 2016

Authors:

C. Evers , Y. Dorfan, S. Gannot, and P. A. Naylor

Abstract:

Speaker localization algorithms often assume static location for all sensors. This assumption simplifies the models used, since all acoustic transfer functions are linear time invariant. In many applications this assumption is not valid. In this paper we address the localization challenge with moving microphone arrays. We propose two algorithms to find the speaker position. The first approach is a batch algorithm based on the maximum likelihood criterion, optimized via expectation-maximization iterations. The second approach is a particle filter for sequential Bayesian estimation. The performance of both approaches is evaluated and compared for simulated reverberant audio data from a microphone array with two sensors.

“2D direction of arrival estimation of multiple moving sources using a spherical microphone array”

IEEEXplore Access:

in Proc. European Signal Processing Conf. (EUSIPCO), Budapest, Hungary, Aug. 2016

Authors:

A. H. Moore, C. Evers , and P. A. Naylor,

Abstract:

Direction of arrival estimation using a spherical microphone array is an important and growing research area. One promising algorithm is the recently proposed Subspace Pseudo-Intensity Vector method. In this contribution the Subspace Pseudo-Intensity Vector method is combined with a state-of-the-art method for robustly estimating the centres of mass in a 2D histogram based on matching pursuits. The performance of the improved Subspace Pseudo-Intensity Vector method is evaluated in the context of localising multiple moving sources where it is shown to outperform competing methods in terms of clutter rate and the number of missed detections whilst remaining comparable in terms of localisation accuracy.

Parallel processing of mex files

What to do if you want to speed up Matlab code where a mex file is called inside for-loops? For example, you would like to simulate impulse responses for spherical microphone arrays using the SMIR Generator over a set of Monte Carlo experiments.

If your for-loops can be parallelized and you are working on a Linux machine, enabling compilation with OpenMP may be the answer. Make sure OpenMP is installed on your machine and you are compiling with a Matlab compatible version of gcc. Then compile your mex file in Matlab using the following command instead:

mex smir_generator_loop.cpp CFLAGS="\$CFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"

Good luck!

“Direction of Arrival Estimation in the Spherical Harmonic Domain Using Subspace Pseudointensity Vectors”

IEEE Xplore:

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Authors:

Alastair H. Moore, Christine Evers, and Patrick A. Naylor

Abstract:

Direction of arrival (DOA) estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented that operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses pseudointensity vectors (PIVs) and works well in acoustic environments where only one sound source is active at any time. The second uses subspace pseudointensity vectors (SSPIVs) and is targeted at environments where multiple simultaneous soures and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise, and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state of the art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated by using speech recordings in a real acoustic environment.