Auditory Scene Analysis

More than half the world's population above the age of 75 years develop age-related hearing loss. They have difficulty understanding speech amidst background noise, like when listening to someone speak in a noisy cafe. Colloquially this is known as the ‘cocktail party problem’ which most animals and humans are able to solve but computers cannot. However, how our brains solve this challenge is not well understood.

Monkey model

I explored whether monkeys are a good model of human brain mechanisms underlying auditory segregation. Unlike in humans, the use of monkeys allows systematic invasive brain recordings to characterise how single neurons achieve this feat. However, before one can record from a monkey brain and generalise the results to humans it is essential to show that the underlying mechanisms are similar in both species.

Here is a visual summary of this project.

I employed synthetic auditory stimuli over speech as they do not have semantic confounds and help us to develop animal models. Our behavioural experiments showed that rhesus macaques are able to perform auditory segregation based on the simultaneous onset of spectral elements (temporal coherence). I conducted functional magnetic resonance imaging (fMRI) in awake behaving macaques to show that the underlying brain network is similar to that seen in humans. My study is the first investigation to show such evidence in any animal model.

Relevant publication:

Here is my 3 minute video explaining this work

Here is my poster summarising this work


Here is a presentation about this work


Here is the peer-reviewed paper about this work


Role of Attention

What is the role of attention in auditory segregation? Is attention necessary for segregation to occur? I employ electroencephalography (EEG) technique in humans with normal hearing to address these questions using Speech-In-Noise (SIN) as well as Stochastic Figure-Ground (SFG) stimuli.

To elicit the role of top-down attention on auditory segregation, I manipulated attention between relevant (auditory) and irrelevant (visual) modalities. The auditory task was to detect absence of an auditory object within two kinds of acoustic stimuli i.e. detect absence of either "Figure" in SFG or Speech in SIN stimuli. The irrelevant visual task was to detect absence of coherent motion of dots within Variable Coherence Random Dot Motion (VCRDM) stimulus.

Here is demonstration of the visual stimulus that employed Variable Coherence Random Dot Motion (VCRDM).


Here is a visual summary of this project.

I observed a significant difference between auditory object and background scene i.e. figure vs ground and speech vs noise, in active condition when subjects paid attention to sounds but I did not find a significant difference in distracted condition where subjects paid attention to images.

So I conclude that attention aids in the separation of overlapping sounds. However if attention is directed elsewhere to a demanding task thus depleting computational resources then automatic segregation of the auditory scene is compromised.

Predictors of Speech perception in Noise

I aimed to find the predictors of Speech perception in noise that pertain to central auditory sequence grouping mechanism.

Native English speakers are required to complete Oldenburg English sentence based Speech perception in Noise (SiN) task with 3-talker babble noise (which is more akin to informational masking) as well as 16-talker babble noise (which is more akin to energetic masking) where the Target to Masker Ratio (TMR) was adapted on a 2-up-1-down staircase. They were also required to complete gap detection in Stochastic Figure Ground (SFG) stimulus where Figure TMR was adapted.

I hypothesised that SiN threshold TMR is significantly correlated to SFG threshold TMR. The study is currently in progress.