Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
Today's Free Data Friday post revisits PROC SQL and we'll delve into museum data.
Museums have always fascinated me. I'm intrigued by places dedicated to preserving artifacts about people I'll never meet and places I'll probably not get to visit. Canada has a large number of museums, ranging from agriculture to space and everything in between. I've visited many, but definitely want to see more, especially the Canada Science and Technology Museum!
I found a data set that includes all artifacts from the Canada Agriculture and Food Museum, Canada Aviation and Space Museum, and the Canada Science and Technology Museum. You can get the data from here.
Because the data uses pipes (|) as the way to separate the columns, the regular Import task won't work, so I used PROC IMPORT:
PROC IMPORT OUT= WORK.WANT DATAFILE= '/folders/myfolders/musuems.csv' DBMS=DLM REPLACE; DELIMITER='|'; GETNAMES=YES; DATAROW=2; RUN;
I've found myself needing to do some "acrobatics" with some data over the past week and PROC SQL really saved me. I used a lot of these to save time, and I realised that because some of them are undocumented or lesser known, I should post about them.
The first is an easy way to catch typical typos; it's not as good as PERL, but for basic checks it's easy to use.
PROC SQL; SELECT artifactNumber, GeneralDescription FROM work.want WHERE GeneralDescription =* 'fiber'; QUIT;
You'll see that instead of the regular = 'fiber', I have put an * after the equals sign. This allows SAS to search for anything that's similar to your criteria.
You'll notice that not only does it recognise "fibre," but also all other capitalizations of the word.
We're all familiar with the basic math functions, but sometimes it's more efficient and easier to just use a function.
PROC SQL; SELECT artifactNumber, begindate, enddate, RANGE(begindate, enddate) AS Years FROM work.want WHERE group1='Communications'; QUIT;
Instead of doing EndDate - BeginDate, I used RANGE(), which I use if I have a lot of calculations to perform on a number of variables. (I find it easier to type).
The last two tips I wanted to share get more into the debugging / troubleshooting side of PROC SQL. While very handy, they can get you started tracking down pesky issues. I highly recommend reading this paper for more information.
proc sql _method outobs=10; select artifactNumber, begindate, enddate from work.want; quit;
There are no surprises to the results; what's interesting is what is returned in the log.
SAS is telling me that it's used a SELECT statement or clause on the WORK.WANT table, which we knew, but I wanted a simple example.
If you use the _METHOD on more complex queries, you'll see a wide range of other descriptors; here's the full list (copied from the paper above):
/* Code Description SQXCRTA Create table as Select. SQXSLCT Select statement or clause. SQXJSL Step loop join (Cartesian). SQXJM Merge join operation. SQXJNDX Index join operation. SQXJHSH Hash join operation. SQXSORT Sort operation. SQXSRC Source rows from table. SQXFIL Rows filtration. SQXSUMG Summary stats (aggregates) with GROUP BY clause. SQXSUMN Summary stats with no GROUP BY clause. */
The final trick I wanted to show you is more just to spark your interest. I want to do a more involved post on this, as it can get fairly complex. Still, it's a very useful tool to have in your toolbox.
One of the features of programming languages is how they actually do what they do. I have spent many hours looking at Execution Plans in SQL Server, network packet analysis, etc. In PROC SQL, we have _TREE:
PROC SQL _TREE; SELECT artifactNumber, begindate, enddate FROM work.want WHERE objectname = 'Clock'; QUIT;
This is what is written to the Log:
Basically, you start at the left (SELECT) and it goes to the Source, then to the Object (table) which has the variables. You can see my criteria in the bottom. Having a map of how SAS thinks allows me as a programmer to make my code more efficent, troubleshoot more effectively, and have really cool things to hang on my wall 🙂
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.