An Idea Exchange for SAS software and services

Comments
by Super User
on ‎02-11-2016 11:03 AM

This isn't a clear idea, can you please clarify your suggestion and possibly include sample data that will generate the warning/error.  

by Super User
on ‎02-11-2016 11:09 AM

You will have to find where the message appears in your log.  Then right above that you will see the DATA step code that generated the message, including the names of the data sets, and a BY statement that tells you the names of the BY variable(s).

by Contributor EH
on ‎02-12-2016 01:30 AM

Sometimes when I merge a lot of datasets into one (using a by) this message appears. It appears when two or more datasets are having repeats of by values. The log tells me, but it does not tell me which of the merged datasets are 'guilty'. I can find out by deduplicating them one by one of course, but maybe the log can tell me...

 

I am still trying to present a example using small datasets, so far without success....

 

Cheers,

Eric

by Contributor EH
on ‎02-12-2016 02:29 AM

Ok, below a working example.

 

data a; do id=1 to 5; avar = id; output a; end; run;
data b; id=1; bvar=id; output; id = 4; bvar=id; output; id = 4; bvar=id; output; run;
data c; id=1; cvar=id; output; id = 3; cvar=id; output; id = 4; cvar=id; output; id = 4; evar=id; output; id = 5; cvar=id; output; run;
data d; do id=1 to 5; dvar = id; output d; end; run;

data total;
merge a b c d;
by id;
run;

 

 

The log shows:

 

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 5 observations read from the data set WORK.A.
NOTE: There were 3 observations read from the data set WORK.B.
NOTE: There were 5 observations read from the data set WORK.C.
NOTE: There were 5 observations read from the data set WORK.D.
NOTE: The data set WORK.TOTAL has 6 observations and 6 variables.
NOTE: DATA statement used (Total process time):

 

But how do I know which of the datasets are having (the same) repeats of by values. (yes, they are b and c). I can find out by using proc sort nodupkey but wouldn't it be nice if the log tells me?

 

Cheers,

Eric

by PROC Star
on ‎02-12-2016 02:20 PM

Yeah, it would be nice.  But then, the next thing one might want is the value(s) of the BY variables that were duplicates.  Personally, I think it's fair that SAS only gives a note (which I treat as an error), and then it's left up to you to investigate.  Generally, if I have any doubt about the uniqueness of a dataset I'm about to merge, I will run %DuplicateCheck(work.a,by=id), which does something like:

data _null_;
  set work.a(keep=id);
  by id;
  if not (first.id and last.id) then put "ERROR: found a dup " id= ;
run;

 

That said, you can code up something what you want, using in= variables and forcing them to reset to 0 at the top of the data step loop, something like:

 

40   data total;
41     a=0;b=0;c=0;d=0;
42     merge a (in=a)
43           b (in=b)
44           c (in=c)
45           d (in=d);
46     by id;
47     if first.id=0 and sum(a,b,c,d)>1 then do;
48       put "WARNING: Duplicates in multiple datasets " (id a b c d)(=);
49     end;
50   run;

WARNING: Duplicates in multiple datasets id=4 a=0 b=1 c=1 d=0
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 5 observations read from the data set WORK.A.
NOTE: There were 3 observations read from the data set WORK.B.
NOTE: There were 5 observations read from the data set WORK.C.
NOTE: There were 5 observations read from the data set WORK.D.
NOTE: The data set WORK.TOTAL has 6 observations and 6 variables
by Super User
on ‎02-12-2016 03:14 PM

I can see your point about merging multiple data sets, not just 2.  I guess this would be relatively easy to implement.  After all, SAS is already doing most of the required work in order to print the current warning message.

 

Also note, the warning message appears when there is no many-to-many matching.  For example, it would appear here:

 

ID in A     ID in B

1                1

1                2

1                2

2                2

by PROC Star
on ‎02-12-2016 09:41 PM

@Astounding, I've only seen the message when there is many-to-many matching.  I don't get the message from your test data:

 

85   data a;
86     id=1;
87     output;output;output;
88     id=2;
89     output;
90   run;

NOTE: The data set WORK.A has 4 observations and 1 variables.

91
92   data b;
93     id=1;
94     output;
95     id=2;
96     output;output;output;
97   run;

NOTE: The data set WORK.B has 4 observations and 1 variables.

98
99   data c;
100    merge a b;
101    by id;
102  run;

NOTE: There were 4 observations read from the data set WORK.A.
NOTE: There were 4 observations read from the data set WORK.B.
NOTE: The data set WORK.C has 6 observations and 1 variables.

 

I think the NOTE is generated when the MERGE statement executes and encounters a case where the BY value does not change and records are read from multiple data sets.

by Super User
on ‎02-12-2016 10:23 PM

OK, that's a change then.  I did test it, but that was years ago when the message first started appearing.

by Contributor EH
on ‎02-13-2016 03:06 AM

Hello Quentin and Astounding,

 

Thanks for your comments. The NOTE only appears when there are datasets with the same repeats of by values.

My first suggestion would be to call it a WARNING.

And still I would like SAS to name the suspected datasets. Sometimes I merge over ten large datasets.

 

Cheers,

Eric

by PROC Star
on ‎01-17-2017 12:30 AM
Idea Statuses
Top Liked Authors