BookmarkSubscribeRSS Feed
ckx
Quartz | Level 8 ckx
Quartz | Level 8

I'm trying to convert an older SPSS dataset from 2008 to SAS. This works for the most part but some labels and formats have a � symbol for special characters such as "mu" or a curly apostrophe. Is there any way I can properly convert these datasets?

 

I'm using SAS 9.4M8. I'm running SAS on LSAF (a special computing environment for pharma) so changing the character encoding on the SAS side isn't an option. Any suggestions?

 

3 REPLIES 3
ballardw
Super User

If you have an SPSS install and export the data to SAS with the correct options it should create a SAS program with Proc Format statements. 

Then you may be able to modify the format to use the special characters by use of unicode. You might still have issues with seeing the results properly depending on the SAS encoding if that doesn't support the unicode characters.

 

If you have to rely on the SAS import for SAV files then I suspect the only option will be to write data step code to modify the values to something your system will handle and possibly write a custom formats get the special characters to display using an acceptable font and/or unicode. Which may be long task if there are many of these special values.

 

Without knowing the required system encoding and font used can't provide anything better at this time.

 

Tom
Super User Tom
Super User

Is the symptom only that the characters are not displaying (stored using wrong encoding)?

Or is it causing the import step to fail and not create the DATASET and/or the format CATALOG?

 

If the former then perhaps you can fix the data after the import by using the ENCODING=ANY dataset option.

 

For example say the problem is only in the UNIT variable in the LAB dataset.Let's assume that the issue is that your SPSS file was made using UTF-8 encoding and your SAS session is running using WLATIN1 encoding.  Then you might using a step like this to convert the character.  

data lab_fixed;
  set lab(encoding=any);
  UNIT = kcvt(UNIT,'UTF-8','WLATIN1');
run;

If it is the reverse then you might need to make the UNIT variable longer since some characters that use only one byte in WLATIN1 could use more than one byte in UTF-8.

 

If the issue is in the FORMAT definition then use PROC FORMAT to make a DATASET from the format catalog.  Then use similar data step to fix the characters.  And then use another PROC FORMAT call to make a catalog from the fixed dataset.

Tom
Super User Tom
Super User

It sounds like you ran something like:

libname mylib 'some permanent directory name';
proc import dbms=sav
   out=mylib.myspss replace
   datafile='myfile.sav'
;
   fmtlib=mylib.myspss_formats;
run;

Which says to convert the 'myfile.sav' SPSS file into the SAS dataset name MYLIB.MYSPSS and save the "value labels" into the format catalog named MYLIB.MYSPSS_FORMATS.

 

Did that work? 

Does the dataset have the right number of variables and observations?

Run proc contents on the generated dataset to check.

proc contents data=MYLIB.MYSPSS varnum ;
run;

 

You mentioned issues with LABEL values.  Are you talking about the labels attached to the variables in the SAS dataset?  For example you might have a variable named UNIT with a label of "Lab test units" attached to it.   

 

Or do you mean the SPSS "value labels"?  SPSS means by that usage of the word "label" the type of code/decode pairs that SAS stores in format definitions.  To see how the formats are defined you can use the FMTLIB option of PROC FORMAT.

proc format lib=mylib.myspss_formats fmtlib;
run;

You can also use the CNTLOUT= option to convert the format catalog into a SAS dataset that can later be used to recreate the format catalog by using the CNLTIN= option.

proc format noprint lib=mylib.myspss_formats cntlout=mylib.myspss_formats;
run;

 

What is the setting for the ENCODING system option being used in your SAS session?  To check run something like this:

%put encoding=%sysfunc(getoption(encoding));

 

So if your SAS session is using UTF-8 (or some other multibyte encoding) and your SPSS file was using WLATIN1 (or some other single byte encoding) to store the mu character and PROC IMPORT did not convert the values then you will potentially see invalid values.

 

And if PROC IMPORT did convert the values, but did not make sure the variables were defined long enough to hold the extra bytes then you might get a truncated multibyte character code, which would be invalid (and also hard to fix).

 

To check the hex code for how the Greek character mu is stored in a SAS character variable you can use the $HEX. format.  So if you can identify an example of the value of a variable that has the problem character then run something like this.  Say the variable named UNIT and the problem character can be found in observation number 123 then you could run something like this to see the hexcodes in the SAS log.

data _null_;
  set mylib.myspss(firstobs=123);
  put unit = / unit=:$hex. ;
  stop;
run;

 

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 263 views
  • 0 likes
  • 3 in conversation