We’re smarter together. Learn from this collection of community knowledge and add your expertise.

How to import SPSS data files into SAS

by Community Manager on ‎12-09-2015 04:34 PM - edited on ‎12-09-2015 04:49 PM by Community Manager (5,432 Views)

If you are coming to SAS after using IBM SPSS Statistics, then you probably have a few SPSS data files that you want to continue to use with SAS.  SAS provides a simple method that allows you to convert an SPSS data file – a SAV file -- to a SAS data set.  


Note: This article shows you how to import SPSS data files in a SAS program.  If you're using SAS Enterprise Guide, there is a built-in task that makes this process easy, no programming required.  Check out this blog post for details.


The ability to read an SPSS data file is a part of PROC IMPORT when you have SAS/ACCESS to PC Files installed.  Good news – this feature is part of the free SAS University Edition, so you can practice this skill from that environment.  I've attached an example SAV file to this article so that you can follow along.  If you're using SAS University Edition, you can download this file and drop it into your shared myfolders location so that SAS can see it.


Here's the basic structure of the program:


proc import out=WORK.SURVEY
  datafile = "/folders/myfolders/my_data/survey.sav"
  dbms = SAV replace;
  fmtlib = WORK.FORMATS;

The key parts of the program are:

- the DBMS=SAV option, which tells SAS that we're expecting a file in the SPSS data file format

- the FMTLIB option, which tells SAS where to build custom SAS formats if the data makes use of SPSS labels.  More on that in just a bit.


After importing the SPSS data for the first time, it's a good idea to take inventory of what you've captured.  PROC DATASETS can show you what you've got.


proc datasets lib=work nolist nodetails;
 contents data=survey;


When we run this on my sample SURVEY data, we see the basic data attributes, including the number of observations (which SPSS calls cases) and the variable names and types.




Next, let's preview first 5 observations (or rows or cases) with a simple PROC PRINT step:


proc print data=work.survey (obs=5);



Now, here's an interesting result from my sample SURVEY data.  According to the PROC DATASETS output, the SEX variable is numeric.  But the displayed values in the PROC PRINT output are text: "female" or "male".  If we look more closely at the PROC DATASETS output, we can see that there is a SAS format assigned to SEX – it's called "SEXA."  Where did that come from?


In SPSS data you can use a feature called value labels to map "friendly" names to coded variables.  In this example, the coded values are 1 and 2, but the value labels are "female" and "male".  In SAS, the natural analogy to SPSS value labels is SAS formats.  When we ran the PROC IMPORT step, SAS automatically created SAS format rules for any value labels that it finds.  These were stored in the WORK.FORMATS catalog – because that's what we told SAS to do with the FMTLIB option.


We can use PROC FORMAT to dig in to the rules:


proc format lib=work;
 select sexa;


You can learn more about SAS formats from this blog post and in this free training tutorial.


As you can see, the small PROC IMPORT step performed a lot of work for us, and it made it very easy to use this data -- created in another statistical package-- in SAS.

by Super User
on ‎12-09-2015 06:20 PM

I might suggest running Proc format with the cntlout option to create a permanent dataset, or at least as permanent as the analysis data, that can be used to recreate the formats if needed.



by Community Manager
‎12-10-2015 08:04 AM - edited ‎12-10-2015 08:30 AM

Good point, @ballardw.  With the SEXA format as a "user-defined" format, we need to keep the format definition accessible whenever we use the data set, otherwise the coded value labels will get lost.  We can save the format rules in an external data set with the CNTLOUT= option.


proc format lib=work
  /* save the format rules */ 
 select sexa;

And if you want to remove the formats from the mix, you can "flatten" the data set so that the formatted values become the actual values within the data.  The data might lose some fidelity in the process (as the formatted values might be less precise than the underlying raw values), but the data set is then more portable.  I'll share a technique for that in a different article.

Your turn
Sign In!

Want to write an article? Sign in with your profile.

Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.