BookmarkSubscribeRSS Feed

The sequel to PROC SQL (or, More Advanced Tips and Tricks)

Started ‎06-30-2017 by
Modified ‎08-04-2021 by
Views 2,505

Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

 

Access Now

 

Today's Free Data Friday post revisits PROC SQL and we'll delve into museum data.

 

Museums have always fascinated me. I'm intrigued by places dedicated to preserving artifacts about people I'll never meet and places I'll probably not get to visit. Canada has a large number of museums, ranging from agriculture to space and everything in between. I've visited many, but definitely want to see more, especially the Canada Science and Technology Museum! 

 

Get the data

I found a data set that includes all artifacts from the Canada Agriculture and Food Museum, Canada Aviation and Space Museum, and the Canada Science and Technology Museum. You can get the data from hereFreeDataFriday_graphic.jpg

 

Get started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:
 

Get Started

 

 

Get the data ready

Because the data uses pipes (|) as the way to separate the columns, the regular Import task won't work, so I used PROC IMPORT:

 

 

PROC IMPORT OUT= WORK.WANT
            DATAFILE= '/folders/myfolders/musuems.csv'
            DBMS=DLM REPLACE;
     DELIMITER='|';
     GETNAMES=YES;
     DATAROW=2;
RUN;

 

The results

I've found myself needing to do some "acrobatics" with some data over the past week and PROC SQL really saved me. I used a lot of these to save time, and I realised that because some of them are undocumented or lesser known, I should post about them. 

 

The first is an easy way to catch typical typos; it's not as good as PERL, but for basic checks it's easy to use.

 

PROC SQL;
SELECT artifactNumber, GeneralDescription
FROM work.want
WHERE GeneralDescription =* 'fiber';
QUIT;

 

You'll see that instead of the regular = 'fiber', I have put an * after the equals sign. This allows SAS to search for anything that's similar to your criteria.

 

1.PNG

 

You'll notice that not only does it recognise "fibre," but also all other capitalizations of the word.  

 

We're all familiar with the basic math functions, but sometimes it's more efficient and easier to just use a function.

 

 

PROC SQL;
SELECT artifactNumber, begindate, enddate,
RANGE(begindate,
enddate) AS
Years
FROM work.want
WHERE group1='Communications';
QUIT;

 

Instead of doing EndDate - BeginDate, I used RANGE(), which I use if I have a lot of calculations to perform on a number of variables. (I find it easier to type).

 

 

2.PNG

 

The last two tips I wanted to share get more into the debugging / troubleshooting side of PROC SQL. While very handy, they can get you started tracking down pesky issues. I highly recommend reading this paper for more information.  

 

 

proc sql _method outobs=10;
select artifactNumber, begindate, enddate
from work.want;
quit;

There are no surprises to the results; what's interesting is what is returned in the log. 

 

3.png

 

 

SAS is telling me that it's used a SELECT statement or clause on the WORK.WANT table, which we knew, but I wanted a simple example.

 

If you use the _METHOD on more complex queries, you'll see a wide range of other descriptors; here's the full list (copied from the paper above):

 

/* 
Code Description
SQXCRTA Create table as Select.
SQXSLCT Select statement or clause.
SQXJSL Step loop join (Cartesian).
SQXJM Merge join operation.
SQXJNDX Index join operation.
SQXJHSH Hash join operation.
SQXSORT Sort operation.
SQXSRC Source rows from table.
SQXFIL Rows filtration.
SQXSUMG Summary stats (aggregates) with GROUP BY clause.
SQXSUMN Summary stats with no GROUP BY clause.
*/

 

The final trick I wanted to show you is more just to spark your interest. I want to do a more involved post on this, as it can get fairly complex. Still, it's a very useful tool to have in your toolbox.  

 

One of the features of programming languages is how they actually do what they do. I have spent many hours looking at Execution Plans in SQL Server, network packet analysis, etc.  In PROC SQL, we have _TREE:

 

PROC SQL _TREE;
SELECT artifactNumber, begindate, enddate
FROM work.want
WHERE objectname = 'Clock';
QUIT;

This is what is written to the Log:

 

4.png

 

Basically, you start at the left (SELECT) and it goes to the Source, then to the Object (table) which has the variables. You can see my criteria in the bottom. Having a map of how SAS thinks allows me as a programmer to make my code more efficent, troubleshoot more effectively, and have really cool things to hang on my wall 🙂

 

Now it’s your turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

 

Version history
Last update:
‎08-04-2021 09:51 AM
Updated by:

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags