BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Corinthian94
Obsidian | Level 7

Hi there,

 

I'm having some trouble with concatenating two datasets. We have two datasets with the same variables throughout, one in English and one in Spanish. The questions and answer choices are the same but in different languages. For questions where people can only select one option, when I concatenate the datasets the Spanish formats overwrite the English formats, even though I have removed the formats from the Spanish dataset before concatenating. I'd like to be able to concatenate them without going in and applying formats to all these questions - is there a way to do this? Thanks in advance!

 

proc import datafile="folder\spss.sav" 
out=screener
dbms=sav replace;
run;
 
proc import datafile="folder\spss.sav" 
out=screener_es
dbms=sav replace;
run;

 

data scr_es; set screener_es; format _all_; run; *Remove formats;

data scr_all; set screener scr_es;  run; *Concatenate two datasets;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So you are reading in SPSS files.  See this post: https://communities.sas.com/t5/SAS-Communities-Library/How-to-import-SPSS-data-files-into-SAS/ta-p/2...

 

SPSS does not do codelists the same way as SAS does.  In SPSS each variable has its own decode list.   So if you have 10 variables that all use the same 1/2 codes that are decoded as YES/NO you end up with that information stored 10 times.

 

In SAS the decode lists are created as independent FORMATs that can be attached to any number of variables.   The formats are stored independently in catalogs.  Only the NAME of the format is stored in the dataset. So you can define one format that decodes 1 to YES and 2 to NO and attach the format to all 10 variables or other variables in other datasets.

 

PROC IMPORT will generate a format for each codelist it sees in the SPSS file.  It will make up a name for the format based on the name of the variable.  So if you import two SPSS files with the same variable names then the same formats will be created. Which means the formats created from the second SPSS file will replace the formats created from the first SPSS file.

 

If you want the ENGLISH version of the format to persist then import that ENGLISH version of the SPSS file last.

 

Or better still use the FMTLIB= option of PROC IMPORT to tell SAS where to write the formats it creates from the SPSS files codelists.  You can then use the FMTSEARCH system option to tell SAS which format catalog to search first to find the formats.

proc import datafile="folder_english\spss.sav" 
  out=screener replace
  dbms=sav fmtlib=work.english
;
run;
 
proc import datafile="folder_spanish\spss.sav" 
   out=screener_es replace
  dbms=sav fmtlib=work.spanish
;
run;

options insert=(fmtsearch=(work.english work.spanish));
data want;
  set screener: ;
run;

 

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

I'd like to be able to concatenate them without going in and applying formats to all these questions - is there a way to do this? 

 

I'm not sure I can figure out what the desired output is from your question above. However, if you concatenate the two data sets, the format of a single variable will be the same format for all the rows in the data set, regardless if the was previously in the Spanish data set or the English data set. So each variable will either have the Spanish format or the English format, or no format if you choose to do that.

--
Paige Miller
Tom
Super User Tom
Super User

So you are reading in SPSS files.  See this post: https://communities.sas.com/t5/SAS-Communities-Library/How-to-import-SPSS-data-files-into-SAS/ta-p/2...

 

SPSS does not do codelists the same way as SAS does.  In SPSS each variable has its own decode list.   So if you have 10 variables that all use the same 1/2 codes that are decoded as YES/NO you end up with that information stored 10 times.

 

In SAS the decode lists are created as independent FORMATs that can be attached to any number of variables.   The formats are stored independently in catalogs.  Only the NAME of the format is stored in the dataset. So you can define one format that decodes 1 to YES and 2 to NO and attach the format to all 10 variables or other variables in other datasets.

 

PROC IMPORT will generate a format for each codelist it sees in the SPSS file.  It will make up a name for the format based on the name of the variable.  So if you import two SPSS files with the same variable names then the same formats will be created. Which means the formats created from the second SPSS file will replace the formats created from the first SPSS file.

 

If you want the ENGLISH version of the format to persist then import that ENGLISH version of the SPSS file last.

 

Or better still use the FMTLIB= option of PROC IMPORT to tell SAS where to write the formats it creates from the SPSS files codelists.  You can then use the FMTSEARCH system option to tell SAS which format catalog to search first to find the formats.

proc import datafile="folder_english\spss.sav" 
  out=screener replace
  dbms=sav fmtlib=work.english
;
run;
 
proc import datafile="folder_spanish\spss.sav" 
   out=screener_es replace
  dbms=sav fmtlib=work.spanish
;
run;

options insert=(fmtsearch=(work.english work.spanish));
data want;
  set screener: ;
run;

 

Corinthian94
Obsidian | Level 7

Thank you! This was super helpful.

ballardw
Super User

I would strongly suggest using the FMTLIB option when using Proc Import to read files. Send them to different catalogs if needed.

This would place the formats from the two files into two different catalogs in a libname named A created for that purpose. If your project already has permanent library assigned I would use that.

libname A '<some location>';

proc import datafile="folder\spss.sav" 
out=screener
dbms=sav replace;
   fmtlib=a.formats_eng;
run;
 
proc import datafile="folder\spss.sav" 
out=screener_es
dbms=sav replace;
   fmtlib=a.formats_es;
run;

Why might you do this? Once you have a format catalog you can list the contents to see the definitions of the formats or create a data set to rebuild them if needed (CNTLOUT). Adding one of these catalogs to the FMTSEARCH sytem path means that SAS would use only that version given that the variable names are the same.

 

You will find than in many places the properties of the LAST used variable, data set will be the result.

If your Screener_Es is the Spanish version then I think you may want:

data scr_all; 
   set scr_es  /* actually screener_es likely as well*/
        screener ;  
run; 

If you still have Spanish the likely the English version did not value labels set but the Spanish did.

 

You may want to investigate use of Proc Datasets and the MODIFY statement to remove formats. Doing such in a data step requires that all the observations be read as if more complex manipulation was needed. Datasets modifies the headers of the files so can change variable formats, informats and labels in place.

 

FWIW, I hated getting undocumented SPSS files when I worked in a shop that used SPSS because of things like 'filters' and 'value labels' not being very obvious and sets with many variables could be a pain to decipher.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 619 views
  • 5 likes
  • 4 in conversation