Breakthrough AI tech utilises human perception to filter out background noise

A team of researchers at The Ohio State University has developed an artificial intelligence (AI) model that exploits the way humans perceive speech to enhance audio quality. By combining subjective ratings of sound quality with a speech enhancement model, the team has achieved superior speech clarity, as validated by objective metrics.

Published in the prestigious journal IEEE/ACM Transactions on Audio, Speech, and Language Processing, the new model outperforms conventional approaches in suppressing noisy audio. These unwanted sounds, which often disrupt listeners’ ability to hear what they desire, have long been a challenge to address effectively.

Unlike previous methods that solely rely on objective algorithms to extract noise from desired signals, the Ohio State University researchers have taken a groundbreaking approach. By leveraging perception, they have trained the model to remove unwanted sounds, resulting in remarkable audio improvements.

According to Donald Williamson, an associate professor at The Ohio State University and co-author of the study, the team’s unique focus on using human perception to train the model sets their research apart. Williamson explains, “If people can perceive something about the signal’s quality, then the model can utilize that information to better remove noise.”

This study specifically targeted monaural speech enhancement, which refers to speech originating from a single audio channel, such as a microphone. The researchers trained their innovative model using two datasets from previous studies that involved recordings of people engaged in conversations, some of which were hindered by background noises like TV or music.

To assess the quality of each recording, listeners provided subjective ratings on a scale of 1 to 100. The model’s exceptional performance stems from a joint-learning method, combining a specialized speech enhancement language module and a prediction model that anticipates the mean opinion score assigned by human listeners to noisy signals.

Results showcased the superiority of this novel approach, outperforming other models in terms of objective metrics such as perceptual quality, intelligibility, and human ratings. However, incorporating human perception of sound quality presents its own set of challenges, as Williamson highlights. Noisy audio evaluation is highly subjective, influenced by factors such as individual hearing capabilities and experiences. Additional considerations, such as hearing aids or cochlear implants, further impact how individuals perceive their sound environment.

This research paves the way for a new era in audio enhancement technology. By embracing the power of human perception, this revolutionary AI model promises to transform audio experiences across various real-world scenarios.

(With inputs from PTI)