merging 2 datasets by variable - multiple lengths

Reply
Occasional Contributor
Posts: 16

merging 2 datasets by variable - multiple lengths

[ Edited ]

Hello!

 

I have 2 datasets I would like to merge by variable OCCSOC. In one of them, it is connected to a formats file. (Unfortunately, I can't figure out how to duplicate the dataset by get rid of the formats.)

 

I am getting the following statement:

52   Data onetlib.onetipums;
53   merge IPUMS.ACS2016 onetlib.onetmergedup;
54   length occsoc $10;
WARNING: Length of character variable OCCSOC has already been set.
         Use the LENGTH statement as the very first statement in the DATA STEP to declare the
         length of a character variable.
55   by occsoc;
56   format occsoc $occf.;
57   run;

WARNING: Multiple lengths were specified for the BY variable OCCSOC by input data sets and
         LENGTH, FORMAT, INFORMAT, or ATTRIB statements. This might cause unexpected results.

Data=IPUMS.ACS2016

#     Variable       Type    Len       Format       Label

10   OCCSOC    Char    6                              Occupation, SOC classification

 

Data=ONETLIB.ONETMERGEDUP

#     Variable       Type    Len       Format        Informat           Label

1     OCCSOC    Char    10         $OCCF.       $10.                 OCCSOC

 

This is my merge program:

libname IPUMS "C:\Users\dps24\Documents\D Drive\IPUMS";
libname onetlib 'C:\Users\dps24\Documents\D Drive\ONET\db_21_3_excel\Use for Dissertation';
options fmtsearch=(onetlib.occformat);

proc contents data=IPUMS.ACS2016; run;
proc contents data=onetlib.onetmergedup; run;

*both data are sorted by occsoc;

Data onetlib.onetipums;
merge IPUMS.ACS2016 onetlib.onetmergedup;
length occsoc $10;
by occsoc;
format occsoc $occf.;
run;

proc print data=onetlib.onetipums (obs=10); run;

I get a very funky print of data:

 

Obs MET2013 PERNUM PERWT SEX   OCCSOC   broad major minor IND cognitiveindex manuphysindex workvaluesindex

1       .                .                .             .         11-101     .           .           .          .       70.3800            18.0500               22.9967

 

The variables belonging to ACS2016 have missing data, while the onetmerdup variables look fine. The OCCSOC values are wrong, like the formats didn't do their job. When I print these two datasets separately, they look great and OCCSOC values match (e.g. 111010 instead of 11-1010 - I did the formats because I needed the OCCSOC values to match to eventually merge them like I'm trying now). 

 

Is this a length issue or a format issue? Is it possible to duplicate the onetmergedup data without any attached formats, and would that correct this problem?

 

Thanks so much!!

Diana

 

Super User
Super User
Posts: 8,114

Re: merging 2 datasets by variable - multiple lengths

Posted in reply to Diana_AdventuresinSAS

SAS will define the variable type and storage length for variable used in a data step at the first place it sees the variable. Once the length for character variable is defined it cannot be changed.

 

Because the first place OCCSOC appears it is defined as length $6 the result is that you are currently trying to stuff data from a 10 character long variable into a 6 character long variable. 

 

Moving the LENGTH statement before the MERGE statement will define the variable as length $10. This will prevent truncating the values that are read from ONETLIB.ONETMERGEDUP.

 

The format attached does not matter to how the values are matched. Just how they are printed.

 

Occasional Contributor
Posts: 16

Re: merging 2 datasets by variable - multiple lengths

Thanks Tom!

 

The OCCSOC variable looks good, seems to have worked. However, I still have the case of the missing values for all the ACS2016 variables. 

 

Any thoughts on why that is?

 

Obs    occsoc    MET2013    PERNUM    cognitiveindex    manuphysindex 
1      1110XX    .          .          70.3800             18.0500 
2      111021    .          .          64.8700             26.2500 
PROC Star
Posts: 8,164

Re: merging 2 datasets by variable - multiple lengths

Posted in reply to Diana_AdventuresinSAS

Sounds like the OCCSOC variable is actually different in one of the datasets. If one dataset contains OCCSOC values that contain a hyphen, something like the following would correct it:

data ACS2016;
  length OCCSOC $6.;
  input OCCSOC x;
  cards;
111010 4
111020 3
;

data ONETMERGEDUP;
  length OCCSOC $10.;
  input OCCSOC y;
  cards;
11-1010 1
11-1020 2
;

data onetmergedup_fixed (drop=_:);
  length OCCSOC $6;
  set ONETMERGEDUP (rename=(OCCSOC=_OCCSOC));
  OCCSOC=strip(compress(_OCCSOC,'-'));
run;

data want;
  merge ACS2016 onetmergedup_fixed;
  by OCCSOC;
run;

Art, CEO, AnalystFinder.com

 

Super User
Super User
Posts: 8,114

Re: merging 2 datasets by variable - multiple lengths

Posted in reply to Diana_AdventuresinSAS

@Diana_AdventuresinSAS wrote:

Thanks Tom!

 

The OCCSOC variable looks good, seems to have worked. However, I still have the case of the missing values for all the ACS2016 variables. 

 

Any thoughts on why that is?

 

Obs    occsoc    MET2013    PERNUM    cognitiveindex    manuphysindex 
1      1110XX    .          .          70.3800             18.0500 
2      111021    .          .          64.8700             26.2500 

I would assume it is because those values of the BY variable are not present in that dataset.

Formats change how the value is printed not how it is stored.

Check the values for leading spaces as normally SAS will not display leading spaces.

proc sql ;
  select case when occsoc = left(occsoc) then 'LEFT ALIGNED' 
   else 'OTHER' end as check,count(*)
  from ACS2016 
  group by 1
 ;
quit;

 

Super User
Posts: 10,259

Re: merging 2 datasets by variable - multiple lengths

Posted in reply to Diana_AdventuresinSAS

A merge is done with raw, unformatted values. Create a new variable with correct, clean values (in the dateset(s) where that conversion is needed) and merge by that.

Ideally, you should do such cleaning when data is imported into your data warehouse.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Ask a Question
Discussion stats
  • 5 replies
  • 141 views
  • 1 like
  • 4 in conversation