BookmarkSubscribeRSS Feed

The Visual Analytics Alphabet Series – The Data

Started a week ago by
Modified a week ago by
Views 158

Because the Visual Analytics Alphabet Series was inspired at the screening of a horror movie (see A is for Aggregated Data), most of the data used in this series is horror based.

 

HORROR_MOVIES

HORROR_MOVIES from Kaggle, a data set of horror films from 1950 – 2022, was extracted from The Movie Database (TMDB) using the TMDB API. The Movie Database is a community built movie and TV database that contains information about movies, TV shows, cast, and community reviews.

If you would like to use the TMDB API for your own data adventures, check out the documentation.

 

Name Label Description Unique Count Range
id Movie ID Unique ID for TMDB, used to construct link to movie page 32,540  
original_title Original Title Original movie title 30,294  
title Movie Title Movie title 29,563  
original_language Original Lanugage Language in which the movie was made (for example, en for English, no for Norwegian, de for German) 97  
overview Description Description of movie 31,021  
tagline Tagline Tagline of movie 12, 514  
release_date Release Date Release date (mm/dd/yyyy) 10,999  
poster_path Poster Image Unique name of the movie poster. This can be used to generate a link to the poster image. 28,049  
popularity Popularity Score Lifetime popularity score generated by the community. For movies, this is based on daily metrics (like number of votes, number of views, times favorited, times watchlisted), release date, total votes, and the previous day’s score.   0 - 5,088.584
vote_count # User Ratings Number of user ratings for movie   0 - 16,900
vote_average User Score (1-1) Average rating for movie. Ratings range from 1 to 10 stars.   0.5 - 10
budget Budget (in $) Budget of movie in US dollars   $1 - $200,000,000
revenue Revenue (in $) Revenue made from movie in US dollars   $1 -$701.842,551
runtime Movie Runtime (min) Official runtime of movie (in minutes)   1 - 683
status Movie Status Status of movie (Released, In Production, Post Production, Planned) at time of extract (late 2022) 4  
adult <not used> <not used> <not used>  
backdrop_path Backdrop Image Unique name of movie backdrop image. This can be used to generate a link to the poster image. 13,537  
genre_name Genre(s) List of movie genres 772  
collection Collection ID ID of the collection, used to construct link to collection page. 816  
collection_name Collection Name Name of collection 816  

 

A few data cleansing techniques needed to be applied to the HORROR_MOVIES table to get the data report ready:

  • Several measures were set to zero when data was not available for those movies (for example, vote_average, budget, revenue, and runtime). These zeros were replaced with missing values.
  • Year had to be created from release_year to join the table with KILLCOUNTS
  • Some of the titles had to be modified to match the title with KILLCOUNTS 

 

KILLCOUNTS

KILLCOUNTS from Github, a data set of horror films from 1922-2025, was sourced from community projects (like Dead Meat, MovieBodyCounts, List of Deaths Wiki, and work done by Randal Olson.

 

Name Label Description Unique Count Range
title Movie Title Movie Title 469  
year Release Year Release year 63  
count Kill Count Total confirmed kills   1 - 4,295
tmdb_id TMDB ID The Movie Database (TMDB) unique ID 482  

 

A few data cleansing techniques needed to be applied to the KILLCOUNTS table to get the data report ready:

  • In the original table, year is stored as a category. It was converted to a date in Visual Analytics using the DateFromMDY and TreatAs functions.
    Data_YearCalculation.png

 

HAUNTED_PLACES

HAUNTED_PLACES from Kaggle, a data set of haunted places in the United States was compiled by Tim Renner using The Shadowlands Haunted Places Index.

 

Name Label Description Unique Count Range
city City City where the haunted place is located 4,285  
country Country Country where the haunted place is located (all United States) 1  
description Description Description of the haunted place 10,979  
location Location Name of the haunted place 9,691  
state State US state where the haunted place is located 51  
state_abbrev state_abbrev US two-letter state abbreviation where the haunted place is located 51  
longitude Location Longitude Longitude of the haunted place   -164.7224104 - -66.6667528
latitude Location Latitutde Latitude of the haunted place   19.632069 - 66.8925886
city_longitude City Longitude Longitude of the city center   -164.7238888 - -67.8402316
city_latitude City Latitude Latitude of the city center   19.5756191 - 66.8983333

 

A few data cleansing techniques needed to be applied to the HAUNTED_PLACES table to get the data report ready:

  • Location values were modified to ensure consistency. For example, Saint Peters Catholic Church was standardized to St Peters Catholic Church.
  • Unique ID was created when importing the Microsoft Excel file by selecting Create unique ID column in the Import Data window. This creates a column that has a unique value for each row, which can be used for text analytics.
  • AI was used to group locations into 9 distinct Haunting Locations:
    • Cemeteries & Graveyards
    • Homes & Residences
    • Hospitals & Asylums
    • Hotels & Lodging
    • Parks & Natural Areas
    • Roads, Bridges & Paths
    • Schools & Universities
    • Theatres & Entertainment
    • Other Buildings/Places
  • AI was used to group descriptions of hauntings into 8 distinct Haunting Categories:
    • Auditory Phenomena
    • Entity-Based
    • Environmental Effects
    • Experiential States
    • Narrative/Historical
    • Physical Interactions
    • Visual Manifestations
    • Mixed/Unclassified
  • AI was used to group descriptions of hauntings into 14 distinct Haunting Types:
    • Animal Spirit
    • Apparitions & Full-Body Ghosts
    • Child Spirit
    • Demonic or Malevolent Entity
    • Disembodied Sounds & Voices
    • Environmental & Emotional Sensations
    • Lights, Orbs & Shadow Phenomena
    • Location-Bound Entity
    • Poltergeist/Object Movement
    • Residual/Repeating Event
    • Sleep & Bedroom Encounters
    • Tragic Death Residual Haunting
    • Other/Mixed Phenomena
    • Unspecified

 

This product uses the TMDB API but is not endorsed or certified by TMDB.

Data_TMDBLogo.png

Contributors
Version history
Last update:
a week ago
Updated by:

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags