With Halloween fast approaching it’s time to start making plans for the big day. One long standing tradition for many people is that of watching a scary movie on Halloween night. One problem nowadays though is that with such easy availability of movies on streaming services it might be difficult to find a scary one that you haven’t already seen. In this edition of Free Data Friday, we will be using SAS OnDemand for Academics to look at data from https://wheresthejump.com/ which scores movie and TV shows by the number and scariness of “jump scares” which they contain.
Unfortunately, the data is not available as a downloadable file but it isn’t huge so I simply copy and pasted it to an Excel spreadsheet. I then saved the file as a CSV and uploaded it to SAS OnDemand for Academics.
I used PROC Import to get the data into SAS. I normally like to use a large value for the guessingrows statement in order to forestall problems with data type and length – I used a value of 1000 in this case. One thing to note is that some of the generated variable names contain spaces so you should ensure that the validvarname option is set to “any” and that when you subsequently use a variable name which does contain spaces enclose it in double quotes with a trailing “n” as you will see later on.
filename reffile '/home/chris52brooks/JumpScares/jumpscares.csv';
option validvarname=any;
proc import datafile=reffile
dbms=csv
out=jumpscares
replace;
getnames=yes;
guessingrows=1000;
run;
This is what the first few lines of the imported file look like:
I wanted to use the data to see if it would help me choose a scary movie which I hadn’t seen before so decided on a few parameters for my search:
The director must have directed a reasonable number of movies to ensure my list of possible movies wasn’t dominated by “one hit wonders”; and
The average jump scare rating per movies must be high; and
There must be a large number of scares overall in the director’s portfolio.
I created the metrics I intended to us with a simple piece of PROC SQL
proc sql;
create table directors
as select
director,
avg("Jump Scare Rating"n) as avgrating,
count("Movie Name"n) as nummovies,
sum("Jump Count"n) as numscares
from jumpscares
group by director
order by nummovies desc, avgrating desc;
quit;
I decided to try to create an attractive output with some Halloween themed colors so I consulted https://www.color-hex.com/color-palette/98417 - this site offers suggestions for color themes and gives both hex and RGB codes for you to use.
Then I used PROC Template to design a three-chart layout showing the key metrics described above. There’s quite a lot going on here, but this illustrates the advantage of good, consistent indentation for code blocks and a methodical approach. You will note that I have used a lattice layout to generate three charts on the same page, removed borders and coordinated the chart colors which demonstrates that each chart can be individually designed within an overall consistent framework.
%let scarybkgrnd=cxF98A00;
%let scarybar=cx8700D4;
proc template;
define statgraph nestedlattice;
begingraph / designwidth=8in designheight=7in;
entrytitle textattrs=(size=30pt) "Who Makes the Scariest Movies?";
entryfootnote halign=right textattrs=(weight=bold) "Data From: https://wheresthejump.com" ;
layout lattice / rows=2 columns=1 rowgutter=50 rowdatarange=data
opaque=true backgroundcolor=&scarybkgrnd;
layout overlay / xaxisopts=(labelattrs=(weight=bold) label="Average Jump Scare Rating")
walldisplay=none ;
barchart category=director y=avgrating / orient=horizontal barlabel=true fillattrs=(color=&scarybar);
endlayout;
layout lattice / rows=1 columns=2 columngutter=10 ;
layout overlay / yaxisopts=(label="No. of Movies")
walldisplay=none xaxisopts=(labelattrs=(weight=bold) label="Number of Scary Movies for Each Director");
barchart category=director y=nummovies / barlabel=true fillattrs=(color=&scarybar);
endlayout;
layout overlay / yaxisopts=(label="Total Scares")
walldisplay=none xaxisopts=(labelattrs=(weight=bold) label="Total Jump Scares for Each Director");
barchart category=director y=numscares / barlabel=true fillattrs=(color=&scarybar);
endlayout;
endlayout;
endlayout;
endgraph;
end;
run;
Of course, on its own the PROC Template doesn’t produce anything, so I used PROC SGRender to generate my output using the template with the Obs=10 condition to limit my view to the ten most prolific directors.
ods pdf file="/home/chris52brooks/JumpScares/JumpReport.pdf";
proc sgrender data=directors(obs=10) template=nestedlattice;
run;
ods pdf close;
This gave me the following output:
This helps us judge that the most likely candidate movies come from James Wan and Wes Craven (with Sam Raimi coming a close third). I can now use another simple piece of SQL to generate a list of the scariest movies by these two directors.
proc sql;
select *
from jumpscares
where director in ("James Wan", "Wes Craven")
order by "Jump Scare Rating"n desc;
quit;
The clear winners here are Insidious and The Conjuring 2 although you can look down the list to find a movie you haven't seen. For US readers the bad news is that neither movie is available on US Netflix according to the data (the highest ranked movie available is the first movie in The Conjuring series) – for UK readers however Insidious is available (I’ve checked…) – so if you decide to follow these recommendations check your local streaming platforms and don’t have nightmares!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Hit the orange button below to see all the Free Data Friday articles.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.