Introduction
The Cocktail Party Effect refers to the human brain’s ability to focus on and separate audio signals when there’s background noise. We do this unconsciously. If someone talks to you at a party you can distinguish their voice from the rest of the background chatter. The same can be said when listening to songs, we can distinguish the various layers of a sound track to understand the components that make up the song. For a computer, this is inherently difficult since the audio signals are mixed. Unsupervised Machine Learning can help us to estimate the independent components in our mixed signals.
Music, stored digitally, is really just another dataset. In this short blog we look at how audio data can be demixed using SAS Viya. When audio is in a stereo format, there is a right-left balance for how the data is panned. In this example there are two audio files which have a left and right track with the two distinct beats mixed.
The below screenshot shows an example of a track created using the GarageBand app on my iPhone. I have two main beats, one a low bass track and another a high ‘laser’ sound. Each sound is panned towards either the left or right side of the audio. Both tracks are slightly mixed, i.e. there may be more of track one on the left side but some of track two is still present and vice versa. These beats are then recorded as two sample stereo tracks. One where the bass is panned more to the left, and the laser panned right and one which is a near opposite. Its important to note the tracks are panned using a slider so there is not perfect symmetry between the tracks.
We’re going to separate the mixed data using Independent Component Analysis (ICA) into two clean signals.
You can run the interactive Jupyter Notebook and access the source code here: https://gitlab.com/HarrySnart/mixedsignals/-/blob/main/SeparatingAudioSignals.ipynb
You can also listen to the two input tracks and the output tracks directly, attached to this article.
Overview of fastICA
The fastICA algorithm is an implementation of ICA which seeks to maximize statistical independence between input signals to identify and separate latent factors (independent components) in the data.
fastICA does this by seeking an orthogonal rotation of pre-whitened data (typically via eigenvalue decomposition) where the measure of non-guassianity acts as a proxy measure for statistical independence. We identify n components as the linear combination of non-Guassian components. Components can be learned through symmetric decorrelation, where all components are estimated simultaneously, or via deflationary decorrelation where components are learned iteratively.
The fastICA action in SAS Viya makes it very simple to perform blind-source separation. The action performs pre-whitening and dimension reduction easily so you only need to specify the number of components to extract and any adjustments to the default settings if necessary. In this example I only used the default settings and simply specified n=2 components (i.e. we have a latent left and right track). The action produces two output tables. One which gives an output of the independent components and another with the whitened variables. As an in-memory CAS action the fastICA action runs as distributed compute allowing you to exploit the available cores and concurrent threads if working with large datasets.
Using the SWAT package
For this demo we’re going to start by loading the data using Python via the SciPy package. The data comes in WAV files, which are uncompressed and easy to load into my notebook environment in its numeric format.
Matplotlib can be used to visualize the audio signals and we can listen to the track using IPython’s display function. For example, we can visualize the first track like below, where the colour of the series indicates left or right side of the track:
Since fastICA is performing blind-source separation it isn’t particularly important which of the series is right or left, since both tracks contain a mix of the latent factors we’re interested in.
Loading and processing data
Once we connect to the CAS environment using SWAT we can upload our audio data. Since WAV files are uncompressed we can read the Numpy arrays directly into Pandas dataframe objects.
fastICA expects all of the mixed signals in a single table, since we are performing blind-source separation it is not necessary to specify which audio track comes from which WAV file.
We have a total of four signals (a left and right track from each WAV file). Once we concatenate the dataframes into a single dataframe, we can upload it directly into CAS using the upload_frame() method on the SWAT connection object.
The raw signals look like the below
Running the CAS Action
Once the data is in-memory we can run the fastICA action. We run the action with default settings and can see we get two output tables.
The ‘demix’ output table is our demixing matrix which tells us how our tracks are demixed into two independent components
The other output table, ‘scores’, contains our two output independent components.
Its important to notice that our output signals are on a very different scale to our input signals. If we play these raw output signals it will be very faint, if not inaudible.
Reshaping the output signals
In order to make the output components more audible we inflate them by a factor of 1,000 and write the output values to an output WAV file. We use the same sample rate that was generated doing the initial read of the source WAV file.
Conclusion
We now have two separate, mono, tracks of our source signals. The fastICA action does a good job of separating the source signals, and if we wanted to we could now merge these left and right tracks to create a clean stereo track with easy distinct sound panned to just one side.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.