BookmarkSubscribeRSS Feed

UFOs, SAS and Data Analysis - Oh my!

Started ‎12-08-2016 by
Modified ‎08-04-2021 by
Views 1,563

Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

 

Access Now

42-22106413.jpg

Anyone who has met me probably knows that I love a good conspiracy theory, and one of my favourites are UFOs.  I joke that The X-Files was a documentary rather than a fiction TV show; I love the idea that we’re not alone. 

 

Get the Data

I found this UFO dataset on Kaggle and is from the National UFO Research Center, is each sighting reported up to 2014.  The fact that there are 88,875 sightings in this dataset with a great collection of variables (shape, year, comments, etc.) makes it a lot of fun to play with.

 

 

Get Started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:

Get Started

 

 

Getting the data ready

The data was already in a format that I could use, and there were no missing or clearly incorrect data.  I also imported a list of all the states for use in my first example.

 

The Results

Here’s a sample from the list of states that I’ve imported:

Screen Shot 2016-11-24 at 7.32.32 PM.png

 

This is a small sample from the UFOs dataset:

Screen Shot 2016-11-24 at 7.32.45 PM.png

 

When I run my query

 

 

proc sql;
select a.State, count(*)
from work.import a left join work.ufos b
            on a.State =b.state
where country='us'
group by a.State
order by a.State;
quit;

 

I get the message:

Screen Shot 2016-11-24 at 7.34.14 PM.png

 

The issue is that SAS sees “tx” and “TX” as two different values, so we need to make SAS think that they are the same.

 

 

 

proc sql;
select a.State, count(*)
from work.import a left join work.ufos b
            on a.State = upper(b.state)
where country='us'
group by a.State
order by a.State;
quit;

 

 

This allows us to get a result set:

Screen Shot 2016-11-24 at 7.34.39 PM.png

 

If you’ve been reading this blog for a while, you know I’m a huge fan of data visualization, so I’ll put this data into a graph.  First, we have to do the counts by State, and put it into a new table:

 

 

 

proc sql;
create table work.State_Count as
select a.State, count(*)
from work.import a left join work.ufos b
            on a.State = upper(b.state)
where country='us'
group by a.State
order by a.State;
quit;

 

 

We’ll do a Bar Chart, and using the Task, I have set it up as so:

Screen Shot 2016-11-24 at 7.38.18 PM.png

 

I have also selected the Show Values in Data Order on the OPTIONS tab to put the data in lowest to highest. 

Screen Shot 2016-11-24 at 7.43.30 PM.png

 

California has had more than twice as many sightings as the next highest state.  That’s pretty impressive (and makes me want to go)!

 

The next thing I wanted to see is when the first and last sightings were for each of the states. 

 

 

proc sql;
select
state, min(datetime) as First, max(datetime) as Last
from work.UFOs
group by state
order by state;
quit;

 

Here is the output:

Screen Shot 2016-11-24 at 7.47.38 PM.png

 

Uh oh – the dates are in a nonsensical format – we need to fix that in order to use the data. 

 

So – first thing we need to do is go to the raw data, and we see that the column has date and time; we need to strip out the time and put it into a more recognisable format.  I do this using FORMAT=dtdate9. for each variable. 

 

*Many thanks to @ChrisHemedinger for helping out with the date formats 🙂

 

proc sql;
select
state,
min(datetime) as First format=dtdate9.,
max(datetime) as Last format=dtdate9.,
count(datetime) as Incidents format=comma12.
from work.UFOs
group by state
order by state;
quit;

 

 

That's better - here are the results:

image66.png

The last query I want to run is finding out which state has the most recent first sighting.  As in my last post, I need to use a nested query – one to get the Maximum value of the “First” variable we just created, and then merging that value back to the full dataset.

 

proc sql;
create table work.UFOs3 as
select
state,
min(datetime) as First format=dtdate9.,
max(datetime) as Last format=dtdate9.,
count(datetime) as Incidents format=comma12.
from work.UFOs
group by state
order by state;
quit;

proc sql;
select *
from work.UFOs3
where First = (select max(First)
from work.UFOs3);
quit;

Here's the output:

image69.png

 

It appears that the state of VI – which, I must admit, I assumed was a typo (I’m unfamiliar with all the states’ short forms).  I looked it up and found out that VI is in fact the Virgin Islands.  From the look of the Min / Max data, the Island has only had 1 sighting.

 

 

Now it’s your turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

 

Comments

I would be tempted to normalize the data to the state populations. You might find that as percent of population (or per 100,000 or similar) that something interesting pops up. No promise but wouldn't it be interesting if Montanans saw UFO at 10 times the rate of Californians?

 

Of course getting the annual populations for the period is going to be an interesting exercise all by itself.

@ballardw oh gee thanks, now you're making me think...I'll see if I can do this for next week's post 🙂  thanks so much for reading!

Chris

Version history
Last update:
‎08-04-2021 07:25 AM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags