Auditory Scene Analysis

More than half the world's population above the age of 75 years develop age-related hearing loss. They have difficulty understanding speech amidst background noise, like when listening to someone speak in a noisy cafe. Colloquially this is known as the ‘cocktail party problem’ which most animals and humans are able to solve but computers cannot. However, how our brains solve this challenge is not well understood.

Monkey model

I explored whether monkeys are a good model of human brain mechanisms underlying auditory segregation. Unlike in humans, the use of monkeys allows systematic invasive brain recordings to characterise how single neurons achieve this feat. However, before one can record from a monkey brain and generalise the results to humans it is essential to show that the underlying mechanisms are similar in both species.

Visual summary

Here is a visual summary of this project.

I employed synthetic auditory stimuli over speech as they do not have semantic confounds and help us to develop animal models. Our behavioural experiments showed that rhesus macaques are able to perform auditory segregation based on the simultaneous onset of spectral elements (temporal coherence). I conducted functional magnetic resonance imaging (fMRI) in awake behaving macaques to show that the underlying brain network is similar to that seen in humans. My study is the first investigation to show such evidence in any animal model.

Video summary

Here is my 3 minute video explaining this work


Here is my poster summarising this work



Here is a presentation about this work


Peer reviewed publication

Here is the peer-reviewed paper about this work


EEG responses

People with hearing-loss commonly experience difficulty in understanding speech in noisy situations, like social gatherings e.g. cocktail parties, café, or restaurants. Earlier scientific studies have established that the ability to segregate and group overlapping generic artificial sounds (not words in any language) is related to the ability to understand speech in social chatter.

In this study, I non-invasively recorded electrical activity from the brain (using electroencephalography aka EEG technique) while the subjects performed a relevant task (sound segregation and grouping) and irrelevant task (a visual task). I compared the electrical signals evoked from the brain during segregation and grouping of non-linguistics artificial sounds against when trying to understand speech amidst (babble) noise.

I established that the electrical brain activity generated during real-world listening is similar to that of artificial sound segregation done passively without paying attention. Thus I showed that brain's response to listening in noisy situations can be studied without using a language and even while subjects are not performing a relevant task.

Brain response to listening in noisy situations can be studied without using speech and in the absence of a relevant task!

Here is a visual summary of the results

Here is the peer reviewed publication

  • Xiaoxuan Guo, Pradeep D, Ester Benzaquen, William Sedley, Timothy D Griffiths, "EEG responses to auditory figure ground perception", Hearing Research, vol. 422, pp. 108524, 2022


Predictors of Speech perception in Noise

I aimed to find the predictors of Speech perception in noise that pertain to central auditory sequence grouping mechanism.

Native English speakers are required to complete Oldenburg English sentence based Speech perception in Noise (SiN) task with 3-talker babble noise (which is more akin to informational masking) as well as 16-talker babble noise (which is more akin to energetic masking) where the Target to Masker Ratio (TMR) was adapted on a 1-down-1-up staircase. They were also required to complete gap detection in Stochastic Figure Ground (SFG) stimulus where Figure TMR, Figure Coherence, and Figure-Gap duration were adapted in three separate staircases.

I hypothesised that SiN threshold TMR is significantly correlated to SFG threshold TMR. The study is currently in progress.

Source code

SFG stimulus

The MATLAB based source code to generate the Stochastic Figure Ground stimulus is shared below

[1] Dheerendra, P., Kumar, S., & Griffiths, T. (2021, June 28). MATLAB source code for generating Stochastic figure-ground (SFG) acoustic stimulus.

fMRI data acquisition

The NIH Cortex based source code to acquire sparse fMRI data on the Stochastic Figure Ground experiment in macaques

[2] Dheerendra, P. (2021, June 28). NIH Cortex software based source code for fMRI data acquisition.

SPM script for fMRI data processing

The MATLAB based source code that employs SPM software to process the sparse fMRI data acquired in macaques on the SFG is shared below

[3] Dheerendra, P. (2020, April 16). SPM script for fMRI data processing.

SFG-SIN M/EEG experiment

The auditory task was to detect absence of an auditory object within two kinds of acoustic stimuli i.e. detect absence of either "Figure" in Stochastic Figure-Ground (SFG) stimuli or absence of "Speech" in Speech-In-Noise (SIN) stimuli.

Here is the source code for conducting an M/EEG experiment

[4] Dheerendra, P. (2022, Sep 4). Figure-Ground Speech-in-Noise M/EEG experiment.

VCRDM stimulus and M/EEG experiment

The irrelevant visual task was to detect absence of coherent motion of dots within Variable Coherence Random Dot Motion (VCRDM) stimulus.

Below is a demonstration of the visual stimulus that employed Variable Coherence Random Dot Motion (VCRDM).


Here is the source code for generating the stimulus and conducting an M/EEG experiment

[5] Dheerendra, P. (2022, September 4). MATLAB scripts for Variable Coherence Random Dot Motion paradigm.