DATA Step, Macro, Functions and more

MERGE statement has more than one data set with repeats of BY values.

Reply
Contributor
Posts: 69

MERGE statement has more than one data set with repeats of BY values.

LIBNAME TAGETLIB DB2 Datasrc=ABDlive SCHEMA=SASDB  AUTHDOMAIN="SASDB_AUTH" ;

 

data source_rep_data;
set tagetlib.src_cons_cibil_table_postval;
/*if datepart(report_date) = '30APR2015'd;*/
run;

PROC SORT data=source_rep_data out=src_temp;
BY account_id int_account_id ownership_indicator;
RUN;
PROC SORT data=sascibil.cibil_consumer_name_add out=modif_temp;
BY account_id int_account_id ownership_indicator;
RUN;

 

 

DATA whole_dat_after_format1;
MERGE src_temp (in=a) modif_temp(in=b);
BY account_id int_account_id ownership_indicator;
IF a=1 ;
RUN;

 

 

I am running above mentioned code and in log it is written below message :-

WARNING: Multiple lengths were specified for the BY variable ACCOUNT_ID by input data sets. This may cause unexpected results.
WARNING: Multiple lengths were specified for the BY variable INT_ACCOUNT_ID by input data sets. This may cause unexpected results.
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 1308561 observations read from the data set WORK.SRC_TEMP.
NOTE: There were 2907358 observations read from the data set WORK.MODIF_TEMP.
NOTE: The data set WORK.WHOLE_DAT_AFTER_FORMAT1 has 2907358 observations and 166 variables.

even though i have written  if and in statement why it is showing "NOTE: MERGE statement has more than one data set with repeats of BY values." ?

 

and SRC_TEMP has 13 lakh record with above code should return the same but in actual it is returning 29 lakhs record why it is so ?

it is due to warning ??

Please help in above code to understand and tell me y it is returning 29 lakhs record not the 13 lakhs records. 

 

 

 

Contributor
Posts: 22

Re: MERGE statement has more than one data set with repeats of BY values.

You have a Cartesian join, de dupe your source tables, with nodupkey or a proc summary.

Trusted Advisor
Posts: 1,826

Re: MERGE statement has more than one data set with repeats of BY values.

[ Edited ]

Check for which combination(s) of BY keys you have more than one observation in each of the both datasets.

 

You can do it by next code:

data dup1;
 set src_temp;
       by BY account_id int_account_id ownership_indicator;
            if not (first.ownership_indicator and last.ownership_indicator);
run;

data dup2;
 set modif_temp;
       by BY account_id int_account_id ownership_indicator;
            if not (first.ownership_indicator and last.ownership_indicator);
run;

proc sql;
        select distinct account_id, int_account_id, ownership_indicator
         from dup1 as a
         inner join dup2 as b
         on 
            a.account_id = b.account_id
            a.int_account_id =b.int_account_id
            a.ownership_indicator = b.ownership_indicator 
; quit;
Super User
Posts: 9,611

Re: MERGE statement has more than one data set with repeats of BY values.

The fact that the output has the same 2.9 million records that modif_temp has lets me suspect that you have at least one match in src_temp for every observation in modif_temp.

Very quick example:

data one;
x = 1;
run;

data two;
x = 1;
output;
output;
run;

data result;
merge
  one (in=a)
  two (in=b)
;
by x;
if a;
run;

proc print data=result noobs;
run;

Log of the merge:

34         data result;
35         merge
36           one (in=a)
37           two (in=b)
38         ;
39         by x;
40         if a;
41         run;

NOTE: There were 1 observations read from the data set WORK.ONE.
NOTE: There were 2 observations read from the data set WORK.TWO.
NOTE: The data set WORK.RESULT has 2 observations and 1 variables.
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 9,611

Re: MERGE statement has more than one data set with repeats of BY values.

And you need to clean up your data, so that these messages vanish:

WARNING: Multiple lengths were specified for the BY variable ACCOUNT_ID by input data sets. This may cause unexpected results.
WARNING: Multiple lengths were specified for the BY variable INT_ACCOUNT_ID by input data sets. This may cause unexpected results.
NOTE: MERGE statement has more than one data set with repeats of BY values.

All these indicate conditions that can (and usually will) make the merge return incorrect results.

 

Maxim 3: Know your data.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 6,543

Re: MERGE statement has more than one data set with repeats of BY values.

The WARNING can cause the NOTE.

 

The WARNING indicates that one (or more) of your BY variables is being truncated when executing the DATA step.  That can cause the data to appear like there are different match patterns occurring.  

 

The solution is easy.  Find the larger length for each variable, and insert a LENGTH statement before the MERGE statement, assigning that maximum length.

 

If there are other problems with the data, they will need to be fixed.  But it is also possible that assigning the proper LENGTH will clear up everything.

Ask a Question
Discussion stats
  • 5 replies
  • 399 views
  • 0 likes
  • 5 in conversation