BookmarkSubscribeRSS Feed

Exploring Video Game Popularity with The Witcher and SAS

Started ‎01-10-2020 by
Modified ‎08-03-2021 by
Views 3,228

Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

 

Access Now

person-holding-black-game-controller-1563796.jpg

If, like me, you have spent a large part of the Christmas holidays binge-watching the new Netflix series, The Witcher, you will probably be aware that before the TV series came the Witcher video games and before the games came the books written by Polish author Andrzej Sapkowski. These sort of crossover products have become increasingly common with books, comics, games and film or TV titles all feeding interest in each other.

 

In this edition of Free Data Friday, we will use data from Steamdb – a web site providing data downloaded from Steam, the popular video game platform, to examine what factors influence the number of active players of the most recent game in the sequence, The Witcher 3: Wild Hunt.

 

Get the Data

 

FreeDataFriday_graphic.jpgThere are three pieces of data we will be using from the Witcher 3 page on Steamdb – a CSV file of maximum number of active players each day, a CSV of the price of the game every time it changes (Steam hold frequent sales, sometimes at quite deep discounts) and details of items of downloadable content (DLC) – this is not available as a downloadable file, but that isn’t a problem for our analysis.

 

I used two simple PROC Imports to import the CSV files into SAS like so:

 

 

/* Files downloaded from https://steamdb.info/app/292030/graphs/  */
/* data to 29/12/2019 */

filename userfile '/folders/myshortcuts/Dropbox/chart.csv';

proc import datafile=userfile
	dbms=csv
	out=witcherstats
	replace;
	getnames=yes;
run;

filename costfile '/folders/myshortcuts/Dropbox/price-history-for-292030.csv';

proc import datafile=costfile
	dbms=csv
	out=witcherprice
	replace;
	getnames=yes;
run;

 

Get Started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:
 

Get Started

 

Getting the Data Ready

 

The files imported perfectly, but I decided to extract the date portion from the datetime field in both files as I wasn’t interested in the time portion (although that could form an interesting analysis in itself). There were also a handful of records in the player statistics file earlier than 19 May 2015, the games official release date. Apart from one day these showed a player number of one. I’m guessing this was either an error or some form of pre-release testing, so I deleted these earlier records.

 

 

data witcherstats(drop=datetime flags);
	format date yymmdd10.;
	set witcherstats;
	date=input(substr(datetime,1,10),yymmdd10.);	
run;

data witcherprice(drop=datetime);
	format date yymmdd10.;
	set witcherprice;
	date=input(substr(datetime,1,10),yymmdd10.);	
run;

proc sql;
	delete
	from witcherstats
	where date < '19May2015'd;
quit;

 

Finally, I appended the two files together and extracted the day of the week from the date field.

 

 

data alldata;
	length weekday $9;
	set witcherstats witcherprice;
	weekday=left(put(date,downame.));
run;

 

The Results

 

I decided to plot the maximum number of players per day against the price in order to see if price reductions encouraged more players (purchase data isn’t available from steamdb but this feels like a reasonable proxy). The problem here is that we have such a long timeline that plotting it all on one chart would make it difficult to see the level of detail we need. I, therefore, decided to plot one year at a time. Also, I used a series plot for player data but a step chart for price data. You don’t see step charts used very often but they can be extremely useful in cases where values move in “jumps” without passing through intermediate points such as interest rates or, as in our case, prices.

 

Here’s the PROC SGPlot code for 2015 – to produce charts for other years simply amend the dates in the where clause:

 

 

 

ods graphics / reset imagemap;
title1 'The Witcher 3 - Player Numbers & Price';
title2 'Data for 2015';
footnote1 j=r 'Data from https://steamdb.info/app/292030/graphs/';
proc sgplot data=alldata(where=(date>='01Jan2015'd and date<='31Dec2015'd));
	series x=date y=players / legendlabel="Maximum Number of Active Players";
	step x=date y=final_price / y2axis lineattrs=(color=red) legendlabel="Price";
	yaxis label="Maximum Number of Active Players" valuesformat=comma6.;
	y2axis label= "Price";
	xaxis label="Date";
run;

 

This is what the code produces for each year from 2015 to 2019:

 

chart1.png

 

chart2.png

 

chart3.png

 

chart4.png

 

chart5.png

 

There are a few points to note from these charts:

 

  1. There appears to be a pattern of multiple peaks and troughs during each month which we should investigate; and

  2. It does appear that temporary price reductions spike an increase in players, but only in 2017 and beyond. This indicates that there may be little point in discounting prices early in a games lifespan; and

  3. There are two spikes (a small one in October 2015 and a large one in May 2016) uncorrelated with price reductions. The data from the DLC page tells us that these were the two dates for releases of the major expansion packs “Hearts of Stone” and “Blood and Wine”. Other minor pieces of downloadable contact (new armour sets, appearances etc) have no significant effect on the number of players; and

  4. In late 2019 there is a massive spike in players, no doubt triggered by the release of the Netflix series based on the books.

 

In order to confirm some of these findings I used some SQL to firstly check the monthly pattern of peaks and troughs. I suspected that there may a difference in player numbers on different days of the week, so I averaged the number of players per day.

 

 

title1 "Average Number of Players on each Day of the Week";
proc sql;
	select distinct weekday as Day_Of_the_Week,
		avg(players) as Average_Number_of_Players format=comma9.
	from alldata
	group by weekday
	order by  Average_Number_of_Players desc;
quit;

 

sqlprint1.png

 

This, not surprisingly, shows a very strong “weekend effect” with the number of active players peaking on Saturday and Sunday.

 

I also used SQL to check the number of active players during each price point for each year. In order to do this I had to merge the witcherstats file with the witcherprice file and fill in the missing values for the price where it didn't change from the previous value.

 

 

data merged(drop=pre_price);
	merge witcherstats(in=a) witcherprice(in=b);
	by date;
	if a and not b then final_price=pre_price;
	pre_price=final_price;
	retain pre_price;
run;

title1 "Number of Players at each Price Point";
title2;
proc sql;
	select 
		distinct final_price as final_price,
		year(date) as year,
		avg(players) as avg_players format=comma9.
	from merged
	group by year, final_price
	order by year desc, final_price desc;
quit;

 

sqlprint2.png

 

This shows that only in 2015, the year of the game’s release, was the highest number of players seen during the games highest price point. In 2016 the record number was at the third lowest price point and since then the highest number of players has always been seen during the lowest price point. This raises a question mark in my mind as to whether placing the game on sale in its first year to year and a half is justified as it appears to not increase the popularity of the game in terms of player numbers.

 

Now it's Your Turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Visit [[this link]] to see all the Free Data Friday articles.

Comments

Might there be a lag between purchase date and play date?  I know that when I bought Witcher (or any other game) I didn't get to it right away and when I did, I spent some time reading the dox first.

Hi @tomrvincent - yes there certainly will be a lag in some cases between purchase date and play date. Unfortunately I don't think Steam release sales figures for individual games (certainly not at the time granularity which we'd need for this) so I used relative change in active players as what I think is a reasonable proxy. Of course at least some of the increase in active players following an expansion pack release will be people returning to the game after a period of inactivity but in any case an active player base seems a good pointer to continuing sales.

@ChrisBrooks yet all your graphs have price and player counts.  I think other factors are more correlated than price. 

 

Some that come to mind are other popular games and how many players are out there playing a game on a particular day. 

 

In other words, what is Witcher's share of the gaming population by day? 

 

I did something similar to this years ago for Mattel Toys, showing them percentage of sales by toy category across age of recipient child...very informative.

Version history
Last update:
‎08-03-2021 01:49 PM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags