BookmarkSubscribeRSS Feed

How to import SPSS data files into SAS

Started ‎12-09-2015 by
Modified ‎08-04-2021 by
Views 46,250

If you are coming to SAS after using IBM SPSS Statistics, then you probably have a few SPSS data files that you want to continue to use with SAS.  SAS provides a simple method that allows you to convert an SPSS data file – a SAV file -- to a SAS data set.  

 

Note: This article shows you how to import SPSS data files in a SAS program.  If you're using SAS Enterprise Guide, there is a built-in task that makes this process easy, no programming required.  Check out this blog post for details.

 

The ability to read an SPSS data file is a part of PROC IMPORT when you have SAS/ACCESS to PC Files installed.  Good news – this feature is part of the free SAS University Edition, so you can practice this skill from that environment.  I've attached an example SAV file to this article so that you can follow along.  If you're using SAS University Edition, you can download this file and drop it into your shared myfolders location so that SAS can see it.

 

Here's the basic structure of the program:

 

proc import out=WORK.SURVEY
  datafile = "/folders/myfolders/my_data/survey.sav"
  dbms = SAV replace;
  fmtlib = WORK.FORMATS;
run;

The key parts of the program are:

- the DBMS=SAV option, which tells SAS that we're expecting a file in the SPSS data file format

- the FMTLIB option, which tells SAS where to build custom SAS formats if the data makes use of SPSS labels.  More on that in just a bit.

 

After importing the SPSS data for the first time, it's a good idea to take inventory of what you've captured.  PROC DATASETS can show you what you've got.

 

proc datasets lib=work nolist nodetails;
 contents data=survey;
quit;

 

When we run this on my sample SURVEY data, we see the basic data attributes, including the number of observations (which SPSS calls cases) and the variable names and types.

 

spssattrs.png

 

Next, let's preview first 5 observations (or rows or cases) with a simple PROC PRINT step:

 

proc print data=work.survey (obs=5);
run;
 

 

spssrows.png

 

Preserving your SPSS labels as SAS formats

Now, here's an interesting result from my sample SURVEY data.  According to the PROC DATASETS output, the SEX variable is numeric.  But the displayed values in the PROC PRINT output are text: "female" or "male".  If we look more closely at the PROC DATASETS output, we can see that there is a SAS format assigned to SEX – it's called "SEXA."  Where did that come from?

 

In SPSS data you can use a feature called value labels to map "friendly" names to coded variables.  In this example, the coded values are 1 and 2, but the value labels are "female" and "male".  In SAS, the natural analogy to SPSS value labels is SAS formats.  When we ran the PROC IMPORT step, SAS automatically created SAS format rules for any value labels that it finds.  These were stored in the WORK.FORMATS catalog – because that's what we told SAS to do with the FMTLIB option.

 

We can use PROC FORMAT to dig in to the rules:

 

proc format lib=work;
 select sexa;
run;

 spssformat.png

You can learn more about SAS formats from this blog post and in this free training tutorial.

 

As you can see, the small PROC IMPORT step performed a lot of work for us, and it made it very easy to use this data -- created in another statistical package-- in SAS.

 

Saving the SPSS labels as data values

With the SEXA format as a "user-defined" format, we need to keep the format definition accessible whenever we use the data set, otherwise the coded value labels will get lost.  We can save the format rules in an external data set with the CNTLOUT= option.

 

proc format lib=work
  /* save the format rules */ 
  cntlout=work.sexa;
 select sexa;
run;

And if you want to remove the formats from the mix, you can "flatten" the data set so that the formatted values become the actual values within the data.  The data might lose some fidelity in the process (as the formatted values might be less precise than the underlying raw values), but the data set is then more portable

Comments

I might suggest running Proc format with the cntlout option to create a permanent dataset, or at least as permanent as the analysis data, that can be used to recreate the formats if needed.

 

 

Good point, @ballardw.  With the SEXA format as a "user-defined" format, we need to keep the format definition accessible whenever we use the data set, otherwise the coded value labels will get lost.  We can save the format rules in an external data set with the CNTLOUT= option.

 

proc format lib=work
  /* save the format rules */ 
  cntlout=work.sexa;
 select sexa;
run;

And if you want to remove the formats from the mix, you can "flatten" the data set so that the formatted values become the actual values within the data.  The data might lose some fidelity in the process (as the formatted values might be less precise than the underlying raw values), but the data set is then more portable.  I'll share a technique for that in a different article.

Hi Chris,

 

Many thanks for the informative post. it helped me a lot. I have a further question, though. It is great that Proc Import procedure can save SPSS data value labels as SAS formats, so we don't have to reconstruct the formats using Proc format. it automatically creates the SAS data file and the corresponding formats catalog. However, I would like to have the SAS program with Proc Format showing how the formats are created and how the formats are assigned to the data varibles using Proc datasets; modify format; etc. This must be done in the background? Is there anyway I can get this SAS program automatically created through the Import procedure? Is there a way I can recreate this SAS program file from the SAS dataset and the formats? 

 

I know that we can save SPSS file in SPSS as a SAS dataset and specifiy that a SAS 'proc format' program file be created. But I would like to know whether there is a way to do this through the SAS proc import procedure, or other procedure once the data is imported. 

 

Thanks,     

 

Hi @lijunchen - you can turn your SPSS-labels-turned-SAS-formats into SAS data sets.  Use the PROC FORMATS and CTLOUT= option as I showed in a previous comment.  That data set can then be used in another PROC FORMATS step with the CTLIN= option if you ever need to rebuild it.  That's the standard way to keep SAS format definitions portable across systems and different operating systems.  Open or PROC PRINT the data set to see the format definition rules.

I've managed to import a SPSS survey data set (*.sav format) using code similar to what you have posted above. I can see it fine and do basic summary statistics with it, however, when I attempt to manipulate or filter the data, I get errors regarding formatting. The data has numerous user defined formats carried over from SPSS (primarily text codes for numeric binary or Likert responses). While I can strip out the formats completely, I'd rather not as they are useful. I believe the error is related to the user defined formats being too long or out of range for SAS, but I do not know for sure. Is there a way to get SAS to reconfigure the formats to be compatible? I do not have access to SPSS.

 

Here is an example of the error I get (it is always the same no matter what variables I work on or what logical I use.):

 

31   data survey;
32       set survey;
33   if gender=1;
34   run;

ERROR: Width specified for format F is invalid.

 

The variable gender in this case has seven levels from which I only need two 1 and 2 (formatted 'Female', 'Male').

 

Thank you any ideas here. Google has given me some hints, but no solution.

Hi @pdiff,

 

For an immediate fix, submit the following statement before your code:

options nofmterr;

This will prevent SAS from throwing the error when the format definition is missing.  However, it won't help you to recover your Label for display purposes.

 

If you used the FMTLIB= option on the PROC IMPORT statement, make sure that the location you specified is in your format search path (the list of paths where SAS will look for formats).  That's in the FMTSEARCH option.  Check PROC OPTIONS output to see what the current value is.  You can append new search paths like so:

 

options append=(FMTSEARCH=(WORK));

This adds WORK library to the places where SAS will look for formats.

 

 

Thank you for the reply. Unfortunately, I can't seen to get either of these to work. By default, PROC IMPORT is creating a format catalog in the Work library called Formats. If I specify FMTLIB=surveyf, it creates a catalog entry "surveyf" in Work with the formats in it, but does not use them even with the "options append ..." statement you specify. I can import the data, strip the formats, and then reapply the desired ones with no problems, but this seems inefficient, e.g. this works:

 

PROC IMPORT OUT= WORK.SURVEY
            DATAFILE= "C:\My.data.sav"
            DBMS=SPSS REPLACE;
RUN;

PROC DATASETS lib=work;
MODIFY survey;
FORMAT _all_;
INFORMAT _all_;
RUN;


options append=(FMTSEARCH=(WORK));

data survey;
    set survey;
    format gender gendera.;
    if gender=1;
run;

 

I can live with this, but it is tedious 🙂

Again, thank you for your help.

 

Try this statement and see if it eliminates the need to strip/reapply formats:

 

options append=(FMTSEARCH=(WORK.surveyf));

Do we know how sas decides the name of the format ?   Why is the name "sexa" and not only "sex" ?

 

In DI-studio I have job importing a sav-file and then I run a proc format with cntlout=xxx.fmtvalues

In this dataset fmtvalues I can see I have a format called "AFB30AH"

 

If I run the job again I get a new format "AFB30AHA"

And because I have the same session with the same work-folder with the format-catalog,  I now have both formats.

 

My metadata for the imported data is updated with format name AFB30AH from the first run.

So now I get a warning when I run my job the second time, because the formatnames don't match.

 

Why do SAS put an "A" in the end for the formatname when run twice ?

If I run my job again I DONT get a format with "B" in the end 🙂

@Torben_Pedersen, I'm pretty sure that PROC IMPORT for SPSS will not create a format with the same name as an existing format -- whether SAS-supplied or user-defined.  On first pass, the import step creates a format whose name is based on the variable name.  On a second pass, if a format of that name already exists, it won't write over it and also it won't just "skip it" and rely on the existing format of that name.  To truly clear this out before re-importing, you would need to delete the existing format first.

Thank you for the info. 

Version history
Last update:
‎08-04-2021 01:24 PM
Updated by:

sas-innovate-2024.png

 

Secure your spot at the must-attend AI and analytics event of 2024: SAS Innovate 2024! Get ready for a jam-packed agenda featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events.

 

Register by March 1 to snag the Early Bird rate of just $695! Don't miss out on this exclusive offer. 

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags