BookmarkSubscribeRSS Feed

Analyzing Spotify’s Most Streamed Songs: Insights from Data Exploration and Modeling in SAS Studio

Started ‎02-26-2026 by
Modified ‎02-26-2026 by
Views 315

Today, artists reach listeners primarily through music streaming services such as Spotify, Apple Music, and Deezer. In this analysis, we examine the most streamed songs and explore key factors that may contribute to an artist’s success across these platforms. The dataset includes 952 songs and approximately 24 variables, including track and artist names, release date information, stream counts, playlist placements, and chart rankings.

 

 

Loading Data into the SAS Environment

 

First, we load the data into the SAS environment using a LIBNAME statement to access the source files and store output tables. After loading the data, we perform a brief exploration analysis to identify any missing values before proceeding with the main analysis.

 

/*******  Import the Spotify 20243 Songs Dataset  *******/

libname SP '/cisviya-export/cisviya/homes/Dee.McKoy@sas.com/Blog2026';

libname out '/cisviya-export/cisviya/homes/Dee.McKoy@sas.com/Blog2026/output';
/* Start CAS Session */;
cas;
caslib _all_ assign;

proc import
      datafile="/cisviya-export/cisviya/homes/Dee.McKoy@sas.com/Blog2026/Spotify_2023.csv"
      out=out.Spotify_data dbms=csv replace;
      getnames=yes;
run;
proc print data=out.spotify_data (obs=5);
run;

 

01_DMcK_feb1.png

 

 

Data Exploration

 

We use the PROC MEANS procedure to calculate descriptive statistics, including the mean, median, minimum, maximum, and the number of missing values in the dataset.

 

proc means data=out.spotify_data mean median min max nmiss;

    var streams in_spotify_playlists in_spotify_charts

        in_apple_playlists in_apple_charts

        in_deezer_playlists in_deezer_charts;

run;

 

02_DMcK_feb2.png

 

The PROC MEANS output summarizes key streaming and platform metrics. We see that the total number streams over 3.7 billion with a minimum number stream of 2,762 songs. The playlist and chart counts differ across platforms, with Spotify showing the highest average playlist presence, while Deezer exhibits the lowest overall chart participation and the highest number of missing playlist values. Overall, the results highlight substantial differences in exposure and reach across streaming platforms.

 

03_DMcK_feb3.png

 

/****** Streams by Release Year*******/

proc sgplot data=out.spotify_clean;

    vbar released_year / response=streams stat=sum

    datalabel seglabel;

    title "Total Streams by Release Year";

run;

 

This code creates a bar chart showing total Spotify streams by release year using summed stream counts. The chart includes data labels for clarity, making it easy to compare how streaming popularity varies across different release years.

 

04_DMcK_feb4.png

 

In the above code, we use the PROC SORT statement to order the number of streams in descending order. We provide a table of the top 25 streamed songs and a histogram plot of streamed music by release year from 1930-2023. By using the PROC SGPLOT statement, we produce the histogram plot.

 

proc freq data=out.spotify_data;

    tables released_year released_month released_day;

run;

 

05_DMcK_feb5.png

 

In this figure, from our PROC FREQ statement provides frequency of songs released count, with cumulative percent.

 

 

Feature Engineering

 

After conducting the data exploration, we move forward in conducting some feature engineering by removing missing data and inconsistent values. We will utilize the DATA step to create our new dataset table before performing some simple visualization of different variables.

 

/*********CLEAN THE DATA AND HANDLE MISSING VALUES **********/

data out.spotify_clean;

    set out.spotify_data;

    /***** Change the columns for comma delimiter ****/

    format in_spotify_playlists comma16. streams comma16.;

    /*****DROPPING ROWS WITH MISSING DATA****/

    if nmiss(of _numeric_) > 0 then delete;

    /*********Set Numeric Misses to 0 when missing the value*********/

    array numvars[*] streams in_spotify_playlists in_spotify_charts

        in_apple_playlists in_apple_charts in_deezer_playlists in_deezer_charts;

    do i=1 to dim(numvars);

        if missing(numvars[i]) then numvars[i]=0;

    end;

    drop i;

run;

 

The DATA step creates a cleaned dataset, out.spotify_clean, by processing the source table out.spotify_data. The program begins by applying a comma format to the in_spotify_playlists and streams variables and removing any observations where track_name is missing. To handle null values, it utilizes an iterative DO loop and an array to replace missing entries in specific streaming and chart variables with zero. Furthermore, the code uses the MDY function to synthesize a new release_date variable from separate month, day, and year columns, ultimately applying a DATE11. format for standardized chronological reporting.

 

 

Exploratory Visualization Analysis

 

In the exploration analysis, let’s first look at the top 25 streamed songs, this will provide us with a list of  most listened to music by number of listeners.

 

proc sort data=out.spotify_clean out=out.top_streams;

    by descending streams;

    /*******  Top 25 Stream   *******/

    proc print data=out.top_streams (obs=25);

        var track_name artist_name streams;

        title "Top 25 Most Streamed Spotify Songs of 2023";

    run;

 

This code sorts the cleaned Spotify dataset in descending order by stream count and outputs the top 25 most streamed songs. It then displays each track’s name, artist, and total streams, providing a clear summary of the most popular Spotify songs of 2023.

 

06-DMcK_feb6.png

 

The above table provides a list of the top 25 most-streamed songs on Spotify as of 2023, showcasing a mix of contemporary hits and enduring classics. The Weekend holds the number one spot with "Blinding Lights," boasting over 3.7 billion streams, while Ed Sheeran demonstrates significant dominance on the list with four separate entries, including "Shape of You" at rank two. The data reveals a high concentration of pop and hip-hop artists such as Harry Styles, Post Malone, and Dua Lipa, alongside the notable inclusion of Queen’s "Bohemian Rhapsody," which remains a streaming powerhouse decades after its release. Overall, the list reflects a diverse array of global chart-toppers, with every song in the top 25 surpassing the 2.1 billion stream threshold.

 

Artist with Most Streams

 

From the previous illustration, we were able to see majority of the stream’s songs were released between 2010 to 2023. Let’s see who had the most total streams from our dataset.

 

proc print data=out.artist_streams(obs=10);

    var artist_name total_streams;

    format total_streams comma16.;

    title "Top 10 Artists by Aggregate Streams";

run;

 

07_DMcK_feb7.png

 

From the above figure, we have the Top 10 Artist total streams on the dataset. The number one stream artist was The Weekend with 14,185,552,870 billion total streams followed by Taylor Swift and Ed Sheeran. Some care about the likability of music such as danceability, and energy of the music.

 

08_DMcK_feb8.png

 

The scatter plot indicates a weak positive relationship between energy and streams, with most songs concentrated in the mid-to-high energy range (50–80). While many highly streamed songs exhibit moderate to high energy, the wide spread of stream counts suggests that energy alone does not determine streaming success.

 

Noticing that we have a weak relationship between the input variables and the target (streams). It would best practice to perform a correlation coefficient matrix on our input variables.

 

proc corr data=out.spotify_clean;

    var streams energy danceability

        in_spotify_playlists in_spotify_charts

        in_apple_playlists in_apple_charts

        in_deezer_playlists in_deezer_charts

        valence acousticness instrumentalness liveness speechiness;

run;

 

09_DMcK_feb9.png

To enhance our model approach, we have obtained some valuable information about our dataset from the above illustration. Songs that perform well on one service such as Spotify playlists typically perform well on other services like Apple Music and Deezer, indicating a significant overlap in cross-platform popularity, according to the Pearson Correlation Coefficient table. Playlist placements and chart rankings for all platforms are favorably correlated with overall streams. Energy is a property of audio that has a mildly positive correlation with valence and a large negative correlation with acousticness. This suggests that songs with higher energy tend to be less acoustic and have a somewhat more upbeat mood. Danceability shows a moderate positive relationship with valence and a negative relationship with acousticness, implying danceable tracks are generally more upbeat and less acoustic.

 

 

Conclusion

 

We can conclude that most highly streamed songs were released between 2010 and 2023 (reference, Fig. Total Streams by Release Year), indicating the continued popularity of modern hits alongside newer releases, while established global artists such as The Weekend, Taylor Swift, and Ed Sheeran dominated total streams. We showed how SAS can be effectively applied to music analytics from data ingestion and feature engineering to visualization. However, after further examination of our data, we gain insight that the number of streams is not directly correlated with audio features and streaming successfully showing weak correlations. This indicates that while platform exposure strongly aligns across services, individual musical characteristics such as energy and danceability have relatively small direct linear associations with streaming and chart performance. By potentially adding variables that are more correlated with streams could provide more meaningful information and gain useful insights.

 

For more information:

 

 

 

Find more articles from SAS Global Enablement and Learning here.

Contributors
Version history
Last update:
‎02-26-2026 10:36 AM
Updated by:

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags