- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I'm having some trouble with concatenating two datasets. We have two datasets with the same variables throughout, one in English and one in Spanish. The questions and answer choices are the same but in different languages. For questions where people can only select one option, when I concatenate the datasets the Spanish formats overwrite the English formats, even though I have removed the formats from the Spanish dataset before concatenating. I'd like to be able to concatenate them without going in and applying formats to all these questions - is there a way to do this? Thanks in advance!
data scr_es; set screener_es; format _all_; run; *Remove formats;
data scr_all; set screener scr_es; run; *Concatenate two datasets;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So you are reading in SPSS files. See this post: https://communities.sas.com/t5/SAS-Communities-Library/How-to-import-SPSS-data-files-into-SAS/ta-p/2...
SPSS does not do codelists the same way as SAS does. In SPSS each variable has its own decode list. So if you have 10 variables that all use the same 1/2 codes that are decoded as YES/NO you end up with that information stored 10 times.
In SAS the decode lists are created as independent FORMATs that can be attached to any number of variables. The formats are stored independently in catalogs. Only the NAME of the format is stored in the dataset. So you can define one format that decodes 1 to YES and 2 to NO and attach the format to all 10 variables or other variables in other datasets.
PROC IMPORT will generate a format for each codelist it sees in the SPSS file. It will make up a name for the format based on the name of the variable. So if you import two SPSS files with the same variable names then the same formats will be created. Which means the formats created from the second SPSS file will replace the formats created from the first SPSS file.
If you want the ENGLISH version of the format to persist then import that ENGLISH version of the SPSS file last.
Or better still use the FMTLIB= option of PROC IMPORT to tell SAS where to write the formats it creates from the SPSS files codelists. You can then use the FMTSEARCH system option to tell SAS which format catalog to search first to find the formats.
proc import datafile="folder_english\spss.sav"
out=screener replace
dbms=sav fmtlib=work.english
;
run;
proc import datafile="folder_spanish\spss.sav"
out=screener_es replace
dbms=sav fmtlib=work.spanish
;
run;
options insert=(fmtsearch=(work.english work.spanish));
data want;
set screener: ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'd like to be able to concatenate them without going in and applying formats to all these questions - is there a way to do this?
I'm not sure I can figure out what the desired output is from your question above. However, if you concatenate the two data sets, the format of a single variable will be the same format for all the rows in the data set, regardless if the was previously in the Spanish data set or the English data set. So each variable will either have the Spanish format or the English format, or no format if you choose to do that.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So you are reading in SPSS files. See this post: https://communities.sas.com/t5/SAS-Communities-Library/How-to-import-SPSS-data-files-into-SAS/ta-p/2...
SPSS does not do codelists the same way as SAS does. In SPSS each variable has its own decode list. So if you have 10 variables that all use the same 1/2 codes that are decoded as YES/NO you end up with that information stored 10 times.
In SAS the decode lists are created as independent FORMATs that can be attached to any number of variables. The formats are stored independently in catalogs. Only the NAME of the format is stored in the dataset. So you can define one format that decodes 1 to YES and 2 to NO and attach the format to all 10 variables or other variables in other datasets.
PROC IMPORT will generate a format for each codelist it sees in the SPSS file. It will make up a name for the format based on the name of the variable. So if you import two SPSS files with the same variable names then the same formats will be created. Which means the formats created from the second SPSS file will replace the formats created from the first SPSS file.
If you want the ENGLISH version of the format to persist then import that ENGLISH version of the SPSS file last.
Or better still use the FMTLIB= option of PROC IMPORT to tell SAS where to write the formats it creates from the SPSS files codelists. You can then use the FMTSEARCH system option to tell SAS which format catalog to search first to find the formats.
proc import datafile="folder_english\spss.sav"
out=screener replace
dbms=sav fmtlib=work.english
;
run;
proc import datafile="folder_spanish\spss.sav"
out=screener_es replace
dbms=sav fmtlib=work.spanish
;
run;
options insert=(fmtsearch=(work.english work.spanish));
data want;
set screener: ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! This was super helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would strongly suggest using the FMTLIB option when using Proc Import to read files. Send them to different catalogs if needed.
This would place the formats from the two files into two different catalogs in a libname named A created for that purpose. If your project already has permanent library assigned I would use that.
libname A '<some location>'; proc import datafile="folder\spss.sav" out=screener dbms=sav replace; fmtlib=a.formats_eng; run; proc import datafile="folder\spss.sav" out=screener_es dbms=sav replace; fmtlib=a.formats_es; run;
Why might you do this? Once you have a format catalog you can list the contents to see the definitions of the formats or create a data set to rebuild them if needed (CNTLOUT). Adding one of these catalogs to the FMTSEARCH sytem path means that SAS would use only that version given that the variable names are the same.
You will find than in many places the properties of the LAST used variable, data set will be the result.
If your Screener_Es is the Spanish version then I think you may want:
data scr_all; set scr_es /* actually screener_es likely as well*/ screener ; run;
If you still have Spanish the likely the English version did not value labels set but the Spanish did.
You may want to investigate use of Proc Datasets and the MODIFY statement to remove formats. Doing such in a data step requires that all the observations be read as if more complex manipulation was needed. Datasets modifies the headers of the files so can change variable formats, informats and labels in place.
FWIW, I hated getting undocumented SPSS files when I worked in a shop that used SPSS because of things like 'filters' and 'value labels' not being very obvious and sets with many variables could be a pain to decipher.