BookmarkSubscribeRSS Feed
Diana_AdventuresinSAS
Obsidian | Level 7

Hello!

 

I have 2 datasets I would like to merge by variable OCCSOC. In one of them, it is connected to a formats file. (Unfortunately, I can't figure out how to duplicate the dataset by get rid of the formats.)

 

I am getting the following statement:

52   Data onetlib.onetipums;
53   merge IPUMS.ACS2016 onetlib.onetmergedup;
54   length occsoc $10;
WARNING: Length of character variable OCCSOC has already been set.
         Use the LENGTH statement as the very first statement in the DATA STEP to declare the
         length of a character variable.
55   by occsoc;
56   format occsoc $occf.;
57   run;

WARNING: Multiple lengths were specified for the BY variable OCCSOC by input data sets and
         LENGTH, FORMAT, INFORMAT, or ATTRIB statements. This might cause unexpected results.

Data=IPUMS.ACS2016

#     Variable       Type    Len       Format       Label

10   OCCSOC    Char    6                              Occupation, SOC classification

 

Data=ONETLIB.ONETMERGEDUP

#     Variable       Type    Len       Format        Informat           Label

1     OCCSOC    Char    10         $OCCF.       $10.                 OCCSOC

 

This is my merge program:

libname IPUMS "C:\Users\dps24\Documents\D Drive\IPUMS";
libname onetlib 'C:\Users\dps24\Documents\D Drive\ONET\db_21_3_excel\Use for Dissertation';
options fmtsearch=(onetlib.occformat);

proc contents data=IPUMS.ACS2016; run;
proc contents data=onetlib.onetmergedup; run;

*both data are sorted by occsoc;

Data onetlib.onetipums;
merge IPUMS.ACS2016 onetlib.onetmergedup;
length occsoc $10;
by occsoc;
format occsoc $occf.;
run;

proc print data=onetlib.onetipums (obs=10); run;

I get a very funky print of data:

 

Obs MET2013 PERNUM PERWT SEX   OCCSOC   broad major minor IND cognitiveindex manuphysindex workvaluesindex

1       .                .                .             .         11-101     .           .           .          .       70.3800            18.0500               22.9967

 

The variables belonging to ACS2016 have missing data, while the onetmerdup variables look fine. The OCCSOC values are wrong, like the formats didn't do their job. When I print these two datasets separately, they look great and OCCSOC values match (e.g. 111010 instead of 11-1010 - I did the formats because I needed the OCCSOC values to match to eventually merge them like I'm trying now). 

 

Is this a length issue or a format issue? Is it possible to duplicate the onetmergedup data without any attached formats, and would that correct this problem?

 

Thanks so much!!

Diana

 

5 REPLIES 5
Tom
Super User Tom
Super User

SAS will define the variable type and storage length for variable used in a data step at the first place it sees the variable. Once the length for character variable is defined it cannot be changed.

 

Because the first place OCCSOC appears it is defined as length $6 the result is that you are currently trying to stuff data from a 10 character long variable into a 6 character long variable. 

 

Moving the LENGTH statement before the MERGE statement will define the variable as length $10. This will prevent truncating the values that are read from ONETLIB.ONETMERGEDUP.

 

The format attached does not matter to how the values are matched. Just how they are printed.

 

Diana_AdventuresinSAS
Obsidian | Level 7

Thanks Tom!

 

The OCCSOC variable looks good, seems to have worked. However, I still have the case of the missing values for all the ACS2016 variables. 

 

Any thoughts on why that is?

 

Obs    occsoc    MET2013    PERNUM    cognitiveindex    manuphysindex 
1      1110XX    .          .          70.3800             18.0500 
2      111021    .          .          64.8700             26.2500 
art297
Opal | Level 21

Sounds like the OCCSOC variable is actually different in one of the datasets. If one dataset contains OCCSOC values that contain a hyphen, something like the following would correct it:

data ACS2016;
  length OCCSOC $6.;
  input OCCSOC x;
  cards;
111010 4
111020 3
;

data ONETMERGEDUP;
  length OCCSOC $10.;
  input OCCSOC y;
  cards;
11-1010 1
11-1020 2
;

data onetmergedup_fixed (drop=_:);
  length OCCSOC $6;
  set ONETMERGEDUP (rename=(OCCSOC=_OCCSOC));
  OCCSOC=strip(compress(_OCCSOC,'-'));
run;

data want;
  merge ACS2016 onetmergedup_fixed;
  by OCCSOC;
run;

Art, CEO, AnalystFinder.com

 

Tom
Super User Tom
Super User

@Diana_AdventuresinSAS wrote:

Thanks Tom!

 

The OCCSOC variable looks good, seems to have worked. However, I still have the case of the missing values for all the ACS2016 variables. 

 

Any thoughts on why that is?

 

Obs    occsoc    MET2013    PERNUM    cognitiveindex    manuphysindex 
1      1110XX    .          .          70.3800             18.0500 
2      111021    .          .          64.8700             26.2500 

I would assume it is because those values of the BY variable are not present in that dataset.

Formats change how the value is printed not how it is stored.

Check the values for leading spaces as normally SAS will not display leading spaces.

proc sql ;
  select case when occsoc = left(occsoc) then 'LEFT ALIGNED' 
   else 'OTHER' end as check,count(*)
  from ACS2016 
  group by 1
 ;
quit;

 

Kurt_Bremser
Super User

A merge is done with raw, unformatted values. Create a new variable with correct, clean values (in the dateset(s) where that conversion is needed) and merge by that.

Ideally, you should do such cleaning when data is imported into your data warehouse.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2945 views
  • 1 like
  • 4 in conversation