DATA Step, Macro, Functions and more

Avoiding data set merging problems when by-variable has different lengths.

Reply
New Contributor
Posts: 4

Avoiding data set merging problems when by-variable has different lengths.

When merging 2 tables on a common by-variable you run the risk of having a different length on this (or these) by variable(s). If this is the case, SAS will casually inform you with the following warning:

WARNING: Multiple lengths were specified for the BY variable mergevar by input data sets. This may cause unexpected results.

On first sight you might be inclined to ignore this pretty warning but SAS will do exactly as it states: it may cause unexpected results. In some cases the merge might be successful, other times the merge is partial or just not executed at all, leaving you with an empty table.

Take the following example:

DATA table1;

          length mergevar $10;

          mergevar = "blabla";

          table1varIwant = "interesting stuff";

RUN;

DATA table2;

          length mergevar $11;

          mergevar = "blabla";

          table2varIwant = "more interesting stuff";

RUN;

PROC SORT DATA=table1;

          BY mergevar;

RUN;

PROC SORT DATA=table2;

          BY mergevar;

RUN;

DATA combinedtable;

          MERGE table1 table2;

          BY mergevar;

RUN;

The warning will be shown in the log.

WARNING: Multiple lengths were specified for the BY variable mergevar by input data sets. This may cause unexpected results.

Hopefully you will get the correct result but if you have a lot of observations to merge on, changes are… you won’t.

One way to solve this is by first determining the longest length for the ‘mergevar’ and using that information in a new length statement just before the actual merge.

The PROC SQL puts the longest length in a macro variable calledmaxlength’.

PROCSQL;

          SELECT max(length) INTO :maxlength

                   FROM sashelp.vcolumn

                             WHERE libname='WORK'

                                      and memname in

                                      ("table1", "table2")

                                      and upcase(name)="mergevar";

QUIT;

This macrovariable is than used to set the LENGTH of the mergevar.

DATA combinedtable;

          LENGTH mergevar $ &maxlength;

          MERGE table1 table2;

          BY mergevar;

RUN;

This merge runs without warning and you should have the expected result.

Trusted Advisor
Posts: 1,300

Avoiding data set merging problems when by-variable has different lengths.

You could also just use the option varlenchk=nowarn

Contributor
Posts: 35

Avoiding data set merging problems when by-variable has different lengths.

That only suppresses the warning message. It doesn't solve the problem that resulted in the warning message.

Trusted Advisor
Posts: 1,300

Avoiding data set merging problems when by-variable has different lengths.

Certainly it does not 'solve' anything it just suppresses the warning in situations when you know there is no problem needing to be solved.  You can also list the dataset with the larger length first in the merge statement to have the resulting dataset with the longer of the given attributes for the common variables.

Frequent Contributor
Posts: 75

Re: Avoiding data set merging problems when by-variable has different lengths.

Gday,

I think there are two other ways it could be done:

DATA combinedtable;

length mergevar $11.;

MERGE table1 table2;

BY mergevar;

RUN;

^^

Specify the biggest length prior to the merge statement

DATA combinedtable;

  MERGE table2 table1 ; 

  BY mergevar;
RUN;

^^
If you know that table2 has the bigger variable put it first in the merge statement.

Super User
Posts: 9,662

Re: Avoiding data set merging problems when by-variable has different lengths.

Why not direct to use the max length of character variable.

DATA combinedtable;
LENGTH mergevar $ 32767;
MERGE table1 table2;
 BY mergevar;
RUN;

Ksharp

Valued Guide
Posts: 765

Re: Avoiding data set merging problems when by-variable has different lengths.

Hi ... if anyone would like to do some reading (from 9 years ago with most of the same points brought up in the discussion) ...

"Danger:  MERGE Ahead! Warning:  BY Variable with Multiple Lengths!"

by Bob Virgile

http://www.nesug.org/proceedings/nesug03/at/at005.pdf

Respected Advisor
Posts: 3,124

Re: Avoiding data set merging problems when by-variable has different lengths.

Thanks, Mike. I have read one of his books on improving SAS code efficiency, and I have benefited from it a lot!

Regards,

Haikuo

Occasional Contributor
Posts: 19

Avoiding data set merging problems when by-variable has different lengths.

A late comment:
In general, when you fetch a lot of data from a big DWH, you want an ABEND if ANYTHING is incorrect. Then you can sit down and consider the structure of your tables, and what problem you really want to solve.

  

There are of course several technical ways of avoiding the the error message, but they do not really help you to get accurate and realiable SAS-programs.

Please note that the use of LENGTH mergevar $ 32767; will increase the amount of computer resorces used.

Regards  Anders

Anders Sköllermo
Ph.D., Swedish Actuary

"Both Age and IQ are 69+"
"Retired, But Not Tired"

Sandgränd 13, S-178 40 Ekerö
email: anders.skollermo@one.se
tele: 00468968419 mobil: 0046735077373
Ask a Question
Discussion stats
  • 8 replies
  • 4238 views
  • 1 like
  • 8 in conversation