What’s this data?
Today, we’re going to check out some data on earthquakes and see what measures can help us most in figuring out where future earthquakes may happen.
How to download
If you don’t already have University Edition, get it here, follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials.
Get the data from here: http://earthquake.usgs.gov/earthquakes/search/
You will need to enter in the times below as your start and end time to get the data to be as close to mine as possible.
Do note that my data will not be the exact same as yours as the website updates and modifies its information regularly. For example when going back to check my data I noticed a location had been moved one kilometer. They also added in a couple of earthquakes that I didn't have before.
How to get the data and prep it for analysis
The main challenge with this batch of data is removing the Z’s from the date time value so you can format the variable correctly. The compress function takes care of that in the first data step. Then instead of removing the T’s we want to replace them with slashes just to maintain symmetry to the format. To do that I have used the transwrd function. Then in the last data step I use the input function on the same variables to change their format now that they are set up in a proper way to convert them.
filename quake "/folders/myfolders/my_data/Earthquake data.csv"; data earthquakes; infile quake dlm="," dsd firstobs=2; input Time :$25. Latitude :7.4 Longitude :9.4 Depth :6.2 Magnitude :4.1 MagType :$3. NST :3. Gap :3. Dmin :6.3 RMS :4.2 Net :$2. ID :$10. Updated :$25. Place :$100. Type :$10.; Time=compress(time, 'Z'); Updated=compress(updated, 'Z'); run; data earthquakes2; set earthquakes; Time2=tranwrd(time, 'T', '/'); Updated2=tranwrd(updated, 'T', '/'); drop time updated; run; data earthquakes3; set earthquakes2; Time=input(trim(time2), YMDDTTM23.3); Updated=input(trim(updated2), YMDDTTM23.3); format time datetime. updated datetime.; drop updated2 time2; run; proc corr data=earthquakes3; var depth latitude dmin gap longitude magnitude rms; run;
What does this output mean?
Now we can go in and see if there are any correlations among the variables. The first thing I wanted to explore were the longitude and latitude variables, seeing as they are the location variables. I wanted to check and see if there were any locations that were getting a particularly large amount of earthquakes. They didn’t correlate too well with each other so there is no one area or line that is getting all the earthquakes. I threw in all the other description variables and it appeared that there were no strong correlations. Earthquakes occur along tectonic plate boundaries and, plate boundaries do not follow a linear path. From the latitude and longitude scatter plot you can see there are some areas that have a lot of activity but, no clear pattern. Now look to the map, every earthquake is plotted on the map and you can see clear plate boundaries. Then comparing to our scatter plot, it is clear that our plot follows the same pattern. From here we see that there isn’t much of a mathematic pattern but, there is definitely a geologic pattern. From this I can definitely say the areas bordering the Pacific Ocean are most likely to endure the next big earthquake. Do note that the maximum latitude represented on the scatter plot is not the same as the world map so the scatter plot is more of a “zoomed in” look.
Now it’s your turn!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Need data for learning?
The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:
We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:
Click Analytics U, then select "Subscribe" from the Options menu.