## UFOs, SAS and Data Analysis - Oh my!

Started ‎12-08-2016 by
Modified ‎08-04-2021 by
Views 1,521

Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

Anyone who has met me probably knows that I love a good conspiracy theory, and one of my favourites are UFOs.  I joke that The X-Files was a documentary rather than a fiction TV show; I love the idea that we’re not alone.

Get the Data

I found this UFO dataset on Kaggle and is from the National UFO Research Center, is each sighting reported up to 2014.  The fact that there are 88,875 sightings in this dataset with a great collection of variables (shape, year, comments, etc.) makes it a lot of fun to play with.

## Get Started with SAS OnDemand for Academics

In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:

The data was already in a format that I could use, and there were no missing or clearly incorrect data.  I also imported a list of all the states for use in my first example.

The Results

Here’s a sample from the list of states that I’ve imported:

This is a small sample from the UFOs dataset:

When I run my query

``````proc sql;
select a.State, count(*)
from work.import a left join work.ufos b
on a.State =b.state
where country='us'
group by a.State
order by a.State;
quit;``````

I get the message:

The issue is that SAS sees “tx” and “TX” as two different values, so we need to make SAS think that they are the same.

``````proc sql;
select a.State, count(*)
from work.import a left join work.ufos b
on a.State = upper(b.state)
where country='us'
group by a.State
order by a.State;
quit;``````

This allows us to get a result set:

If you’ve been reading this blog for a while, you know I’m a huge fan of data visualization, so I’ll put this data into a graph.  First, we have to do the counts by State, and put it into a new table:

``````proc sql;
create table work.State_Count as
select a.State, count(*)
from work.import a left join work.ufos b
on a.State = upper(b.state)
where country='us'
group by a.State
order by a.State;
quit;``````

We’ll do a Bar Chart, and using the Task, I have set it up as so:

I have also selected the Show Values in Data Order on the OPTIONS tab to put the data in lowest to highest.

California has had more than twice as many sightings as the next highest state.  That’s pretty impressive (and makes me want to go)!

The next thing I wanted to see is when the first and last sightings were for each of the states.

``````proc sql;
select
state, min(datetime) as First, max(datetime) as Last
from work.UFOs
group by state
order by state;
quit;``````

Here is the output:

Uh oh – the dates are in a nonsensical format – we need to fix that in order to use the data.

So – first thing we need to do is go to the raw data, and we see that the column has date and time; we need to strip out the time and put it into a more recognisable format.  I do this using FORMAT=dtdate9. for each variable.

*Many thanks to @ChrisHemedinger for helping out with the date formats 🙂

`proc sql;selectstate, min(datetime) as First format=dtdate9., max(datetime) as Last format=dtdate9., count(datetime) as Incidents format=comma12.from work.UFOsgroup by stateorder by state;quit;`

That's better - here are the results:

The last query I want to run is finding out which state has the most recent first sighting.  As in my last post, I need to use a nested query – one to get the Maximum value of the “First” variable we just created, and then merging that value back to the full dataset.

`proc sql;create table work.UFOs3 as selectstate, min(datetime) as First format=dtdate9., max(datetime) as Last format=dtdate9., count(datetime) as Incidents format=comma12.from work.UFOsgroup by stateorder by state;quit;`proc sql; select * from work.UFOs3 where First = (select max(First) from work.UFOs3);quit;``

Here's the output:

It appears that the state of VI – which, I must admit, I assumed was a typo (I’m unfamiliar with all the states’ short forms).  I looked it up and found out that VI is in fact the Virgin Islands.  From the look of the Min / Max data, the Island has only had 1 sighting.

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

I would be tempted to normalize the data to the state populations. You might find that as percent of population (or per 100,000 or similar) that something interesting pops up. No promise but wouldn't it be interesting if Montanans saw UFO at 10 times the rate of Californians?

Of course getting the annual populations for the period is going to be an interesting exercise all by itself.

@ballardw oh gee thanks, now you're making me think...I'll see if I can do this for next week's post 🙂  thanks so much for reading!

Chris

Version history
Last update:
‎08-04-2021 07:25 AM
Updated by:
Contributors
Article Labels
Article Tags