Get the Data
I love space stuff – I spend hours going through space simulations, looking at images from the Hubble and other telescopes, and have read tons of books on a variety of topics. I was really excited to see NASA offers a huge variety of open data; the one I’m using is on meteorite landings. You can get the data from https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh where it’s available in multiple formats; I used the CSV format and then imported all my data into SAS University Edition.
How to go about getting SAS University Edition
If you don’t already have University Edition, get it here and follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials. Additional resources are available in this article.
Getting the data ready
I’ve imported the data into a library I’ve called “space”. This data is very clean and so there’s not much you have to do to prepare it; you may want to do some data exploration to become familiar with the table however.
The table is good, but it doesn’t tell us much – you can find out some basic information, but to get to the really good stuff, we’re going to have to perform some descriptive statistics to get the real story.
OK – let’s get started. First off, I want to know how many total Meteorite landings there are in the dataset.
Apparently there are about 25,000, which is way more than I expected. But this is good news because we have a decent sized database to play with.
Next, let’s take a look at the RECCLASS (recorded class of the meteorite) and see how to do a simple count; you’ll notice a Group By and Order By statement at the end. You’ll need to include these because they tell SAS how to sort the data, and how you want the data displayed. If the variable is in the SELECT statement and not in a calculation, it has to be in both. Play around with the order and see what happens. For now, here’s my code and the output:
As with the Restaurant data (see last week’s post, How clear are your favorite New York City and Toronto restaurants?), this doesn’t tell us much – for the sake of this post however, I will just say if you wanted to sort by the count, put the output into a temp table (CREATE TABLE WORK.<your table name>) then do a select on your temp table with an ORDER BY the count column.
Next, let’s take the Average across the different classes. Average and other functions in PROC SQL need a variable specified to do their job. You’ll notice I don’t have MASS in the Group By or Order By statements – that’s because it’s only being used in a calculation.
Just in the first few rows, there are some masses (in grams) that are pretty huge.
Next, I want to categorize the mass into buckets. This uses a CASE statement, allowing me to specify parameters for each calculation. You’ll notice that in the previous COUNTs, I’ve used COUNT(*) – count everything. In this case, because we’re specifying a little more detail, we have to tell SAS how many of each unit to count, so we use the THEN 1 END to tell SAS for each unit that meets our criteria, count 1. It’s not often you’ll change this, but I have needed to when I am counting by a weight or a factor (1.3, 1. 6, 1.9 for each Unit).
I find it interesting that the Aubrite, which had one of the highest averages in the previous example (85,957.46 grams) has one of the highest counts of meteorites less than 1000 grams. Very common, with a few over 5000 grams.
The last example I have is sort of a nonsensical one, but I wanted to show one more function within PROC SQL – how to SUM using the CASE statement. Unlike the COUNT…THEN 1 END statements, this function requires us to tell us which variable to sum by – so I’m telling SAS for every meteorite that meets my criteria, THEN MASS END.
Now it’s your turn!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Need data for learning?
The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:
We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:
Click Analytics U, then select "Subscribe" from the Options menu.