We’re smarter together. Learn from this collection of community knowledge and add your expertise.

UFOs, SAS and Data Analysis - Oh my!

by Super Contributor on ‎12-08-2016 06:12 PM (756 Views)

42-22106413.jpgAnyone who has met me probably knows that I love a good conspiracy theory, and one of my favourites are UFOs.  I joke that The X-Files was a documentary rather than a fiction TV show; I love the idea that we’re not alone. 

 

Get the Data

I found this UFO dataset on Kaggle and is from the National UFO Research Center, is each sighting reported up to 2014.  The fact that there are 88,875 sightings in this dataset with a great collection of variables (shape, year, comments, etc.) makes it a lot of fun to play with.

 

How to go about getting SAS University Edition

If you don’t already have University Edition, get it here and follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials. Additional resources are available in this article.

 

Getting the data ready

The data was already in a format that I could use, and there were no missing or clearly incorrect data.  I also imported a list of all the states for use in my first example.

 

The Results

Here’s a sample from the list of states that I’ve imported:

Screen Shot 2016-11-24 at 7.32.32 PM.png

 

This is a small sample from the UFOs dataset:

Screen Shot 2016-11-24 at 7.32.45 PM.png

 

When I run my query

 

 

proc sql;
select a.State, count(*)
from work.import a left join work.ufos b
            on a.State =b.state
where country='us'
group by a.State
order by a.State;
quit;

 

I get the message:

Screen Shot 2016-11-24 at 7.34.14 PM.png

 

The issue is that SAS sees “tx” and “TX” as two different values, so we need to make SAS think that they are the same.

 

 

 

proc sql;
select a.State, count(*)
from work.import a left join work.ufos b
            on a.State = upper(b.state)
where country='us'
group by a.State
order by a.State;
quit;

 

 

This allows us to get a result set:

Screen Shot 2016-11-24 at 7.34.39 PM.png

 

If you’ve been reading this blog for a while, you know I’m a huge fan of data visualization, so I’ll put this data into a graph.  First, we have to do the counts by State, and put it into a new table:

 

 

 

proc sql;
create table work.State_Count as
select a.State, count(*)
from work.import a left join work.ufos b
            on a.State = upper(b.state)
where country='us'
group by a.State
order by a.State;
quit;

 

 

We’ll do a Bar Chart, and using the Task, I have set it up as so:

Screen Shot 2016-11-24 at 7.38.18 PM.png

 

I have also selected the Show Values in Data Order on the OPTIONS tab to put the data in lowest to highest. 

Screen Shot 2016-11-24 at 7.43.30 PM.png

 

California has had more than twice as many sightings as the next highest state.  That’s pretty impressive (and makes me want to go)!

 

The next thing I wanted to see is when the first and last sightings were for each of the states. 

 

 

proc sql;
select
state, min(datetime) as First, max(datetime) as Last
from work.UFOs
group by state
order by state;
quit;

 

Here is the output:

Screen Shot 2016-11-24 at 7.47.38 PM.png

 

Uh oh – the dates are in a nonsensical format – we need to fix that in order to use the data. 

 

So – first thing we need to do is go to the raw data, and we see that the column has date and time; we need to strip out the time and put it into a more recognisable format.  I do this using FORMAT=dtdate9. for each variable. 

 

*Many thanks to @ChrisHemedinger for helping out with the date formats :-)

 

proc sql;
select
state,
min(datetime) as First format=dtdate9.,
max(datetime) as Last format=dtdate9.,
count(datetime) as Incidents format=comma12.
from work.UFOs
group by state
order by state;
quit;

 

 

That's better - here are the results:

image66.png

The last query I want to run is finding out which state has the most recent first sighting.  As in my last post, I need to use a nested query – one to get the Maximum value of the “First” variable we just created, and then merging that value back to the full dataset.

 

proc sql;
create table work.UFOs3 as
select
state,
min(datetime) as First format=dtdate9.,
max(datetime) as Last format=dtdate9.,
count(datetime) as Incidents format=comma12.
from work.UFOs
group by state
order by state;
quit;

proc sql;
select *
from work.UFOs3
where First = (select max(First)
from work.UFOs3);
quit;

Here's the output:

image69.png

 

It appears that the state of VI – which, I must admit, I assumed was a typo (I’m unfamiliar with all the states’ short forms).  I looked it up and found out that VI is in fact the Virgin Islands.  From the look of the Min / Max data, the Island has only had 1 sighting.

 

 

Now it’s your turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Need data for learning?

 

The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:

 

Image11.png

 

We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:

 

IMAGE12.png

 

Click Analytics U, then select "Subscribe" from the Options menu.

 

Happy Learning!

 

 

Comments
by Super User
on ‎12-08-2016 06:48 PM

I would be tempted to normalize the data to the state populations. You might find that as percent of population (or per 100,000 or similar) that something interesting pops up. No promise but wouldn't it be interesting if Montanans saw UFO at 10 times the rate of Californians?

 

Of course getting the annual populations for the period is going to be an interesting exercise all by itself.

by Super Contributor
on ‎12-08-2016 08:13 PM

@ballardw oh gee thanks, now you're making me think...I'll see if I can do this for next week's post :-)  thanks so much for reading!

Chris

Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.