Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
If, like me, you have spent a large part of the Christmas holidays binge-watching the new Netflix series, The Witcher, you will probably be aware that before the TV series came the Witcher video games and before the games came the books written by Polish author Andrzej Sapkowski. These sort of crossover products have become increasingly common with books, comics, games and film or TV titles all feeding interest in each other.
In this edition of Free Data Friday, we will use data from Steamdb – a web site providing data downloaded from Steam, the popular video game platform, to examine what factors influence the number of active players of the most recent game in the sequence, The Witcher 3: Wild Hunt.
There are three pieces of data we will be using from the Witcher 3 page on Steamdb – a CSV file of maximum number of active players each day, a CSV of the price of the game every time it changes (Steam hold frequent sales, sometimes at quite deep discounts) and details of items of downloadable content (DLC) – this is not available as a downloadable file, but that isn’t a problem for our analysis.
I used two simple PROC Imports to import the CSV files into SAS like so:
/* Files downloaded from https://steamdb.info/app/292030/graphs/ */ /* data to 29/12/2019 */ filename userfile '/folders/myshortcuts/Dropbox/chart.csv'; proc import datafile=userfile dbms=csv out=witcherstats replace; getnames=yes; run; filename costfile '/folders/myshortcuts/Dropbox/price-history-for-292030.csv'; proc import datafile=costfile dbms=csv out=witcherprice replace; getnames=yes; run;
The files imported perfectly, but I decided to extract the date portion from the datetime field in both files as I wasn’t interested in the time portion (although that could form an interesting analysis in itself). There were also a handful of records in the player statistics file earlier than 19 May 2015, the games official release date. Apart from one day these showed a player number of one. I’m guessing this was either an error or some form of pre-release testing, so I deleted these earlier records.
data witcherstats(drop=datetime flags); format date yymmdd10.; set witcherstats; date=input(substr(datetime,1,10),yymmdd10.); run; data witcherprice(drop=datetime); format date yymmdd10.; set witcherprice; date=input(substr(datetime,1,10),yymmdd10.); run; proc sql; delete from witcherstats where date < '19May2015'd; quit;
Finally, I appended the two files together and extracted the day of the week from the date field.
data alldata; length weekday $9; set witcherstats witcherprice; weekday=left(put(date,downame.)); run;
I decided to plot the maximum number of players per day against the price in order to see if price reductions encouraged more players (purchase data isn’t available from steamdb but this feels like a reasonable proxy). The problem here is that we have such a long timeline that plotting it all on one chart would make it difficult to see the level of detail we need. I, therefore, decided to plot one year at a time. Also, I used a series plot for player data but a step chart for price data. You don’t see step charts used very often but they can be extremely useful in cases where values move in “jumps” without passing through intermediate points such as interest rates or, as in our case, prices.
Here’s the PROC SGPlot code for 2015 – to produce charts for other years simply amend the dates in the where clause:
ods graphics / reset imagemap; title1 'The Witcher 3 - Player Numbers & Price'; title2 'Data for 2015'; footnote1 j=r 'Data from https://steamdb.info/app/292030/graphs/'; proc sgplot data=alldata(where=(date>='01Jan2015'd and date<='31Dec2015'd)); series x=date y=players / legendlabel="Maximum Number of Active Players"; step x=date y=final_price / y2axis lineattrs=(color=red) legendlabel="Price"; yaxis label="Maximum Number of Active Players" valuesformat=comma6.; y2axis label= "Price"; xaxis label="Date"; run;
This is what the code produces for each year from 2015 to 2019:
There are a few points to note from these charts:
There appears to be a pattern of multiple peaks and troughs during each month which we should investigate; and
It does appear that temporary price reductions spike an increase in players, but only in 2017 and beyond. This indicates that there may be little point in discounting prices early in a games lifespan; and
There are two spikes (a small one in October 2015 and a large one in May 2016) uncorrelated with price reductions. The data from the DLC page tells us that these were the two dates for releases of the major expansion packs “Hearts of Stone” and “Blood and Wine”. Other minor pieces of downloadable contact (new armour sets, appearances etc) have no significant effect on the number of players; and
In late 2019 there is a massive spike in players, no doubt triggered by the release of the Netflix series based on the books.
In order to confirm some of these findings I used some SQL to firstly check the monthly pattern of peaks and troughs. I suspected that there may a difference in player numbers on different days of the week, so I averaged the number of players per day.
title1 "Average Number of Players on each Day of the Week"; proc sql; select distinct weekday as Day_Of_the_Week, avg(players) as Average_Number_of_Players format=comma9. from alldata group by weekday order by Average_Number_of_Players desc; quit;
This, not surprisingly, shows a very strong “weekend effect” with the number of active players peaking on Saturday and Sunday.
I also used SQL to check the number of active players during each price point for each year. In order to do this I had to merge the witcherstats file with the witcherprice file and fill in the missing values for the price where it didn't change from the previous value.
data merged(drop=pre_price); merge witcherstats(in=a) witcherprice(in=b); by date; if a and not b then final_price=pre_price; pre_price=final_price; retain pre_price; run; title1 "Number of Players at each Price Point"; title2; proc sql; select distinct final_price as final_price, year(date) as year, avg(players) as avg_players format=comma9. from merged group by year, final_price order by year desc, final_price desc; quit;
This shows that only in 2015, the year of the game’s release, was the highest number of players seen during the games highest price point. In 2016 the record number was at the third lowest price point and since then the highest number of players has always been seen during the lowest price point. This raises a question mark in my mind as to whether placing the game on sale in its first year to year and a half is justified as it appears to not increase the popularity of the game in terms of player numbers.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
SAS is headed back to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team.
Interested in speaking? Content from our attendees is one of the reasons that makes SAS Innovate such a special event!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.