BookmarkSubscribeRSS Feed

Scare yourself silly this Halloween with SAS

Started ‎10-22-2021 by
Modified ‎10-18-2021 by
Views 5,053
SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
 

 

With Halloween fast approaching it’s time to start making plans for the big day. One long standing tradition for manypexels-pixabay-35888.jpg people is that of watching a scary movie on Halloween night. One problem nowadays though is that with such easy availability of movies on streaming services it might be difficult to find a scary one that you haven’t already seen. In this edition of Free Data Friday, we will be using SAS OnDemand for Academics to look at data from https://wheresthejump.com/ which scores movie and TV shows by the number and scariness of “jump scares” which they contain.

 

Get the data

 

Unfortunately, the data is not available as a downloadable file but it isn’t huge so I simply copy and pasted it to an Excel spreadsheet. I then saved the file as a CSV and uploaded it to SAS OnDemand for Academics.

 

Get started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:

Get Started

 

Getting the data ready

 

I used PROC Import to get the data into SAS. I normally like to use a large value for the guessingrows statement in order to forestall problems with data type and length – I used a value of 1000 in this case. One thing to note is that some of the generated variable names contain spaces so you should ensure that the validvarname option is set to “any” and that when you subsequently use a variable name which does contain spaces enclose it in double quotes with a trailing “n” as you will see later on.

 

filename reffile '/home/chris52brooks/JumpScares/jumpscares.csv';

option validvarname=any;

proc import datafile=reffile
	dbms=csv
	out=jumpscares
	replace;
	getnames=yes;
	guessingrows=1000;
run;

 

This is what the first few lines of the imported file look like:

 

Scary Movies Table 1.png

 

The results

 

I wanted to use the data to see if it would help me choose a scary movie which I hadn’t seen before so decided on a few parameters for my search:

 

  1. The director must have directed a reasonable number of movies to ensure my list of possible movies wasn’t dominated by “one hit wonders”; and

  2. The average jump scare rating per movies must be high; and

  3. There must be a large number of scares overall in the director’s portfolio.

I created the metrics I intended to us with a simple piece of PROC SQL

 

proc sql;
	create table directors
	as select
		director,
		avg("Jump Scare Rating"n) as avgrating,
		count("Movie Name"n) as nummovies,
		sum("Jump Count"n) as numscares
	from jumpscares
	group by director
	order by nummovies desc, avgrating desc;
quit;

 

I decided to try to create an attractive output with some Halloween themed colors so I consulted https://www.color-hex.com/color-palette/98417 - this site offers suggestions for color themes and gives both hex and RGB codes for you to use.

 

Then I used PROC Template to design a three-chart layout showing the key metrics described above. There’s quite a lot going on here, but this illustrates the advantage of good, consistent indentation for code blocks and a methodical approach. You will note that I have used a lattice layout to generate three charts on the same page, removed borders and coordinated the chart colors which demonstrates that each chart can be individually designed within an overall consistent framework.

 

%let scarybkgrnd=cxF98A00;
%let scarybar=cx8700D4;

proc template;
	define statgraph nestedlattice;
		begingraph / designwidth=8in designheight=7in;
			entrytitle textattrs=(size=30pt) "Who Makes the Scariest Movies?";
			entryfootnote halign=right textattrs=(weight=bold) "Data From: https://wheresthejump.com" ;
			layout lattice / rows=2 columns=1 rowgutter=50 rowdatarange=data
					opaque=true backgroundcolor=&scarybkgrnd;
				layout overlay / xaxisopts=(labelattrs=(weight=bold) label="Average Jump Scare Rating")
						walldisplay=none ;
					
					barchart category=director y=avgrating / orient=horizontal barlabel=true fillattrs=(color=&scarybar);				
				endlayout;			
				layout lattice / rows=1 columns=2 columngutter=10  ;
					layout overlay / yaxisopts=(label="No. of Movies")
							walldisplay=none xaxisopts=(labelattrs=(weight=bold) label="Number of Scary Movies for Each Director");
						barchart category=director y=nummovies / barlabel=true fillattrs=(color=&scarybar);
							
					endlayout;
					layout overlay / yaxisopts=(label="Total Scares")
							 walldisplay=none xaxisopts=(labelattrs=(weight=bold) label="Total Jump Scares for Each Director");
						barchart category=director y=numscares / barlabel=true fillattrs=(color=&scarybar);
							
				
					endlayout;
				endlayout;
			endlayout;
		endgraph;
	end;
run; 

 

Of course, on its own the PROC Template doesn’t produce anything, so I used PROC SGRender to generate my output using the template with the Obs=10 condition to limit my view to the ten most prolific directors.

 

ods pdf file="/home/chris52brooks/JumpScares/JumpReport.pdf";

proc sgrender data=directors(obs=10) template=nestedlattice;
run;

ods pdf close;

This gave me the following output:

 

Scary Movies Results1.png

 

This helps us judge that the most likely candidate movies come from James Wan and Wes Craven (with Sam Raimi coming a close third). I can now use another simple piece of SQL to generate a list of the scariest movies by these two directors.

 

proc sql;
	select *
	from jumpscares
	where director in ("James Wan", "Wes Craven")
	order by "Jump Scare Rating"n desc;
quit;

Scary Movies Results 2.png

 

The clear winners here are Insidious and The Conjuring 2 although you can look down the list to find a movie you haven't seen. For US readers the bad news is that neither movie is available on US Netflix according to the data (the highest ranked movie available is the first movie in The Conjuring series) – for UK readers however Insidious is available (I’ve checked…) – so if you decide to follow these recommendations check your local streaming platforms and don’t have nightmares!

 

Now it's your turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Hit the orange button below to see all the Free Data Friday articles.

 
Version history
Last update:
‎10-18-2021 09:58 AM
Updated by:

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags