BookmarkSubscribeRSS Feed

Mixed Signals: Using SAS Viya to process and separate messy audio data

Started ‎03-27-2023 by
Modified ‎03-27-2023 by
Views 744

Introduction

The Cocktail Party Effect refers to the human brain’s ability to focus on and separate audio signals when there’s background noise. We do this unconsciously. If someone talks to you at a party you can distinguish their voice from the rest of the background chatter. The same can be said when listening to songs, we can distinguish the various layers of a sound track to understand the components that make up the song. For a computer, this is inherently difficult since the audio signals are mixed. Unsupervised Machine Learning can help us to estimate the independent components in our mixed signals.

 

Music, stored digitally, is really just another dataset. In this short blog we look at how audio data can be demixed using SAS Viya. When audio is in a stereo format, there is a right-left balance for how the data is panned. In this example there are two audio files which have a left and right track with the two distinct beats mixed.

 

The below screenshot shows an example of a track created using the GarageBand app on my iPhone. I have two main beats, one a low bass track and another a high ‘laser’ sound. Each sound is panned towards either the left or right side of the audio. Both tracks are slightly mixed, i.e. there may be more of track one on the left side but some of track two is still present and vice versa. These beats are then recorded as two sample stereo tracks. One where the bass is panned more to the left, and the laser panned right and one which is a near opposite. Its important to note the tracks are panned using a slider so there is not perfect symmetry between the tracks.

 

HarrySnart_0-1679930000070.png

 

We’re going to separate the mixed data using Independent Component Analysis (ICA) into two clean signals.

You can run the interactive Jupyter Notebook and access the source code here: https://gitlab.com/HarrySnart/mixedsignals/-/blob/main/SeparatingAudioSignals.ipynb

 

You can also listen to the two input tracks and the output tracks directly, attached to this article.

 

Overview of fastICA

 

The fastICA algorithm is an implementation of ICA which seeks to maximize statistical independence between input signals to identify and separate latent factors (independent components) in the data.

fastICA does this by seeking an orthogonal rotation of pre-whitened data (typically via eigenvalue decomposition) where the measure of non-guassianity acts as a proxy measure for statistical independence. We identify n  components as the linear combination of non-Guassian components. Components can be learned through symmetric decorrelation, where all components are estimated simultaneously, or via deflationary decorrelation where components are learned iteratively.

 

The fastICA action in SAS Viya makes it very simple to perform blind-source separation. The action performs pre-whitening and dimension reduction easily so you only need to specify the number of components to extract and any adjustments to the default settings if necessary. In this example I only used the default settings and simply specified n=2 components (i.e. we have a latent left and right track). The action produces two output tables. One which gives an output of the independent components and another with the whitened variables. As an in-memory CAS action the fastICA action runs as distributed compute allowing you to exploit the available cores and concurrent threads if working with large datasets.

 

Using the SWAT package

 

For this demo we’re going to start by loading the data using Python via the SciPy package. The data comes in WAV files, which are uncompressed and easy to load into my notebook environment in its numeric format.

 

Matplotlib can be used to visualize the audio signals and we can listen to the track using IPython’s display function. For example, we can visualize the first track like below, where the colour of the series indicates left or right side of the track:

HarrySnart_1-1679930000111.png

 

Since fastICA is performing blind-source separation it isn’t particularly important which of the series is right or left, since both tracks contain a mix of the latent factors we’re interested in.

 

Loading and processing data

 

Once we connect to the CAS environment using SWAT we can upload our audio data. Since WAV files are uncompressed we can read the Numpy arrays directly into Pandas dataframe objects.

 

fastICA expects all of the mixed signals in a single table, since we are performing blind-source separation it is not necessary to specify which audio track comes from which WAV file. 

 

We have a total of four signals (a left and right track from each WAV file). Once we concatenate the dataframes into a single dataframe, we can upload it directly into CAS using the upload_frame() method on the SWAT connection object.

 

HarrySnart_2-1679930000130.png

 

The raw signals look like the below

 

HarrySnart_3-1679930000167.png

 

Running the CAS Action

 

Once the data is in-memory we can run the fastICA action. We run the action with default settings and can see we get two output tables.

 

HarrySnart_4-1679930000210.png

 

The ‘demix’ output table is our demixing matrix which tells us how our tracks are demixed into two independent components

 

HarrySnart_5-1679930000223.png

 

The other output table, ‘scores’, contains our two output independent components.

 

HarrySnart_6-1679930000247.png

 

Its important to notice that our output signals are on a very different scale to our input signals. If we play these raw output signals it will be very faint, if not inaudible.

 

Reshaping the output signals

 

In order to make the output components more audible we inflate them by a factor of 1,000 and write the output values to an output WAV file. We use the same sample rate that was generated doing the initial read of the source WAV file.

 

HarrySnart_7-1679930000259.png

 

Conclusion

 

We now have two separate, mono, tracks of our source signals. The fastICA action does a good job of separating the source signals, and if we wanted to we could now merge these left and right tracks to create a clean stereo track with easy distinct sound panned to just one side.

 

HarrySnart_8-1679930000289.png

 

HarrySnart_9-1679930000319.png

 

Version history
Last update:
‎03-27-2023 12:24 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags