Hello!
I have 2 datasets I would like to merge by variable OCCSOC. In one of them, it is connected to a formats file. (Unfortunately, I can't figure out how to duplicate the dataset by get rid of the formats.)
I am getting the following statement:
52 Data onetlib.onetipums;
53 merge IPUMS.ACS2016 onetlib.onetmergedup;
54 length occsoc $10;
WARNING: Length of character variable OCCSOC has already been set.
Use the LENGTH statement as the very first statement in the DATA STEP to declare the
length of a character variable.
55 by occsoc;
56 format occsoc $occf.;
57 run;
WARNING: Multiple lengths were specified for the BY variable OCCSOC by input data sets and
LENGTH, FORMAT, INFORMAT, or ATTRIB statements. This might cause unexpected results.
Data=IPUMS.ACS2016
# Variable Type Len Format Label
10 OCCSOC Char 6 Occupation, SOC classification
Data=ONETLIB.ONETMERGEDUP
# Variable Type Len Format Informat Label
1 OCCSOC Char 10 $OCCF. $10. OCCSOC
This is my merge program:
libname IPUMS "C:\Users\dps24\Documents\D Drive\IPUMS";
libname onetlib 'C:\Users\dps24\Documents\D Drive\ONET\db_21_3_excel\Use for Dissertation';
options fmtsearch=(onetlib.occformat);
proc contents data=IPUMS.ACS2016; run;
proc contents data=onetlib.onetmergedup; run;
*both data are sorted by occsoc;
Data onetlib.onetipums;
merge IPUMS.ACS2016 onetlib.onetmergedup;
length occsoc $10;
by occsoc;
format occsoc $occf.;
run;
proc print data=onetlib.onetipums (obs=10); run;
I get a very funky print of data:
Obs MET2013 PERNUM PERWT SEX OCCSOC broad major minor IND cognitiveindex manuphysindex workvaluesindex
1 . . . . 11-101 . . . . 70.3800 18.0500 22.9967
The variables belonging to ACS2016 have missing data, while the onetmerdup variables look fine. The OCCSOC values are wrong, like the formats didn't do their job. When I print these two datasets separately, they look great and OCCSOC values match (e.g. 111010 instead of 11-1010 - I did the formats because I needed the OCCSOC values to match to eventually merge them like I'm trying now).
Is this a length issue or a format issue? Is it possible to duplicate the onetmergedup data without any attached formats, and would that correct this problem?
Thanks so much!!
Diana
SAS will define the variable type and storage length for variable used in a data step at the first place it sees the variable. Once the length for character variable is defined it cannot be changed.
Because the first place OCCSOC appears it is defined as length $6 the result is that you are currently trying to stuff data from a 10 character long variable into a 6 character long variable.
Moving the LENGTH statement before the MERGE statement will define the variable as length $10. This will prevent truncating the values that are read from ONETLIB.ONETMERGEDUP.
The format attached does not matter to how the values are matched. Just how they are printed.
Thanks Tom!
The OCCSOC variable looks good, seems to have worked. However, I still have the case of the missing values for all the ACS2016 variables.
Any thoughts on why that is?
Obs occsoc MET2013 PERNUM cognitiveindex manuphysindex
1 1110XX . . 70.3800 18.0500
2 111021 . . 64.8700 26.2500
Sounds like the OCCSOC variable is actually different in one of the datasets. If one dataset contains OCCSOC values that contain a hyphen, something like the following would correct it:
data ACS2016;
length OCCSOC $6.;
input OCCSOC x;
cards;
111010 4
111020 3
;
data ONETMERGEDUP;
length OCCSOC $10.;
input OCCSOC y;
cards;
11-1010 1
11-1020 2
;
data onetmergedup_fixed (drop=_:);
length OCCSOC $6;
set ONETMERGEDUP (rename=(OCCSOC=_OCCSOC));
OCCSOC=strip(compress(_OCCSOC,'-'));
run;
data want;
merge ACS2016 onetmergedup_fixed;
by OCCSOC;
run;
Art, CEO, AnalystFinder.com
@Diana_AdventuresinSAS wrote:
Thanks Tom!
The OCCSOC variable looks good, seems to have worked. However, I still have the case of the missing values for all the ACS2016 variables.
Any thoughts on why that is?
Obs occsoc MET2013 PERNUM cognitiveindex manuphysindex 1 1110XX . . 70.3800 18.0500 2 111021 . . 64.8700 26.2500
I would assume it is because those values of the BY variable are not present in that dataset.
Formats change how the value is printed not how it is stored.
Check the values for leading spaces as normally SAS will not display leading spaces.
proc sql ;
select case when occsoc = left(occsoc) then 'LEFT ALIGNED'
else 'OTHER' end as check,count(*)
from ACS2016
group by 1
;
quit;
A merge is done with raw, unformatted values. Create a new variable with correct, clean values (in the dateset(s) where that conversion is needed) and merge by that.
Ideally, you should do such cleaning when data is imported into your data warehouse.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.