Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
Anyone who has met me probably knows that I love a good conspiracy theory, and one of my favourites are UFOs. I joke that The X-Files was a documentary rather than a fiction TV show; I love the idea that we’re not alone.
Get the Data
I found this UFO dataset on Kaggle and is from the National UFO Research Center, is each sighting reported up to 2014. The fact that there are 88,875 sightings in this dataset with a great collection of variables (shape, year, comments, etc.) makes it a lot of fun to play with.
Getting the data ready
The data was already in a format that I could use, and there were no missing or clearly incorrect data. I also imported a list of all the states for use in my first example.
The Results
Here’s a sample from the list of states that I’ve imported:
This is a small sample from the UFOs dataset:
When I run my query
proc sql;
select a.State, count(*)
from work.import a left join work.ufos b
on a.State =b.state
where country='us'
group by a.State
order by a.State;
quit;
I get the message:
The issue is that SAS sees “tx” and “TX” as two different values, so we need to make SAS think that they are the same.
proc sql;
select a.State, count(*)
from work.import a left join work.ufos b
on a.State = upper(b.state)
where country='us'
group by a.State
order by a.State;
quit;
This allows us to get a result set:
If you’ve been reading this blog for a while, you know I’m a huge fan of data visualization, so I’ll put this data into a graph. First, we have to do the counts by State, and put it into a new table:
proc sql;
create table work.State_Count as
select a.State, count(*)
from work.import a left join work.ufos b
on a.State = upper(b.state)
where country='us'
group by a.State
order by a.State;
quit;
We’ll do a Bar Chart, and using the Task, I have set it up as so:
I have also selected the Show Values in Data Order on the OPTIONS tab to put the data in lowest to highest.
California has had more than twice as many sightings as the next highest state. That’s pretty impressive (and makes me want to go)!
The next thing I wanted to see is when the first and last sightings were for each of the states.
proc sql;
select
state, min(datetime) as First, max(datetime) as Last
from work.UFOs
group by state
order by state;
quit;
Here is the output:
Uh oh – the dates are in a nonsensical format – we need to fix that in order to use the data.
So – first thing we need to do is go to the raw data, and we see that the column has date and time; we need to strip out the time and put it into a more recognisable format. I do this using FORMAT=dtdate9. for each variable.
*Many thanks to @ChrisHemedinger for helping out with the date formats 🙂
proc sql;
select
state,
min(datetime) as First format=dtdate9.,
max(datetime) as Last format=dtdate9.,
count(datetime) as Incidents format=comma12.
from work.UFOs
group by state
order by state;
quit;
That's better - here are the results:
The last query I want to run is finding out which state has the most recent first sighting. As in my last post, I need to use a nested query – one to get the Maximum value of the “First” variable we just created, and then merging that value back to the full dataset.
proc sql;
create table work.UFOs3 as
select
state,
min(datetime) as First format=dtdate9.,
max(datetime) as Last format=dtdate9.,
count(datetime) as Incidents format=comma12.
from work.UFOs
group by state
order by state;
quit;proc sql;
select *
from work.UFOs3
where First = (select max(First)
from work.UFOs3);
quit;
Here's the output:
It appears that the state of VI – which, I must admit, I assumed was a typo (I’m unfamiliar with all the states’ short forms). I looked it up and found out that VI is in fact the Virgin Islands. From the look of the Min / Max data, the Island has only had 1 sighting.
Now it’s your turn!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
I would be tempted to normalize the data to the state populations. You might find that as percent of population (or per 100,000 or similar) that something interesting pops up. No promise but wouldn't it be interesting if Montanans saw UFO at 10 times the rate of Californians?
Of course getting the annual populations for the period is going to be an interesting exercise all by itself.
@ballardw oh gee thanks, now you're making me think...I'll see if I can do this for next week's post 🙂 thanks so much for reading!
Chris
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.