BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi Folks:

 

I'm merging 4 datasets and would like to select the observations matched to main_attr1(in=a) data. I would expect the resulting merged data w_four to have N=250 observations matched to the size of data (in=a). However, merged data w_four resulted in 262 rows.

proc freq below shows that w_four has all levels of data in the other datasets which obviously is not true. 

How to do this merge in data step correct? Any suggestions appreciated.

Thank you very much for your time. 

 

freq.png

data w_four;  
merge 
outcome          (in=o) /*228*/
main_attr1(in=a) /*250*/
Korean_foreign1  (in=f) /*261*/
Korean_migration1(in=m); /*252*/
by id1name idname;
outcome=o;
map_attr=a;
foreign=f;
migration=m;
if a then output w_four; /*262*/ 
run; 
proc freq data=w_four;
tables map_attr*outcome*foreign*migration/list;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Onyx | Level 15

Hi,

 

1) did you checked the log for "NOTE: MERGE statement has more than one data set with repeats of BY values"

2) try to run:

proc sql;
  select id1name, idname, count(1) as i
  from w_four
  group by id1name, idname
  having count(1) > 1
  ;
quit;

to find out which observations are possibly duplicated.

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

11 REPLIES 11
yabwon
Onyx | Level 15

Hi,

 

1) did you checked the log for "NOTE: MERGE statement has more than one data set with repeats of BY values"

2) try to run:

proc sql;
  select id1name, idname, count(1) as i
  from w_four
  group by id1name, idname
  having count(1) > 1
  ;
quit;

to find out which observations are possibly duplicated.

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



ChrisNZ
Tourmaline | Level 20

Always add this kind of check in your code when you expect unique keys (unless the data had previously been vetted):

 

if not(first.IDNAME and last.IDNAME) then putlog 'WARNING: duplicate keys in merge';

 

Also, always check the log. Always.

Cruise
Ammonite | Level 13

Thanks all. My log was:

NOTE: There were 228 observations read from the data set WORK.OUTCOME.
NOTE: There were 250 observations read from the data set WORK.MAIN_ATTR1.
NOTE: There were 261 observations read from the data set WORK.KOREAN_FOREIGN1.
NOTE: There were 252 observations read from the data set WORK.KOREAN_MIGRATION1.
NOTE: The data set WORK.W_FOUR has 262 observations and 32 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 secondS

But after I added Chris's warning note the log turned into. 

WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
NOTE: There were 228 observations read from the data set WORK.OUTCOME.
NOTE: There were 250 observations read from the data set WORK.MAIN_ATTR1.
NOTE: There were 261 observations read from the data set WORK.KOREAN_FOREIGN1.
NOTE: There were 252 observations read from the data set WORK.KOREAN_MIGRATION1.
NOTE: The data set WORK.W_FOUR has 262 observations and 32 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Yabwon's sql returned following. 

 

Cruise
Ammonite | Level 13

With your pointers, the problem is solved. Hurray. 

First I investigated each data set using proc sql checker suggested by Yabwon and found out 'foreign' and 'migration' datasets had duplicate keys. Mostly migration data set. Deduplicated them in data step. Proc sort nodup did not deduplicate. But I had no time to understand why even though proc sort nodup always did a job in the past for me.

But this article helped to be assured. 

https://www.lexjansen.com/wuss/1998/WUSS98097.pdf

And deduplicated datasets in data step and tried merge which output data with N=250 the dataset that I wanted. Awesome.  

Also, without addition of putlog 'WARNING: duplicate keys in merge' that Chris suggested the log wouldn't print out the warning on the duplicate keys. 

Thanks all. I greatly greatly appreciate your time and insights. Helped a lot. 

 

data w_four; /*250*/ 
merge 
outcome          (in=o) /*228*/
main_attr1(in=a) /*250*/
Korean_foreign2  (in=f) /*260*/
Korean_migration2(in=m); /*238*/
by id1name idname;
outcome=o;
map_attr=a;
foreign=f;
migration=m;
if a; /*250*/ 
if not(first.IDNAME and last.IDNAME) then putlog 'WARNING: duplicate keys in merge';
run; 

 

Kurt_Bremser
Super User

As long as only one of the incoming datasets has duplicate keys, SAS will not issue a NOTE, as a 1-to-many join is valid (and often done) in a data step merge. Only when two or more datasets have duplicates, SAS will issue the NOTE, as the outcome may not be what the programmer desired.

Cruise
Ammonite | Level 13
But in this case, two of the datasets had duplicate keys: foreign had one and migration had several. Mostly migration.
Kurt_Bremser
Super User

As long as you do not have duplicates on the same key, SAS will not detect that, as for the individual keys it is still one-to-many. Run these two codes in comparison:

data one;
input key;
datalines;
1
2
2
3
;

data two;
input key;
datalines;
1
1
2
3
;

data mgd;
merge
  one
  two
;
by key;
run;

vs.

data one;
input key;
datalines;
1
2
2
3
;

data two;
input key;
datalines;
1
2
2
3
;

data mgd;
merge
  one
  two
;
by key;
run;

The second code will issue the NOTE, as there's a duplicate for 2 in both datasets.

Cruise
Ammonite | Level 13
Ah, got it. I can clearly see now. Thank you!
Tom
Super User Tom
Super User

Note that the NODUP option on proc sort will only eliminate duplicate observations, not observations that have duplicate keys.  Note that it also will only eliminate duplicate observations if they happen to be right next to each other (there isn't a different observation between them).

You can use NODUPKEYS option to make sure there is only one observation per set of key (by vars) values.

Cruise
Ammonite | Level 13
This was the case in the example here 'Note that the NODUP option on proc sort will only eliminate duplicate observations, not observations that have duplicate keys'. NODUP and NODUPKEYS were different options? I thought the earlier was the abbreviation of NODUPKEYS. My ignorance then.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1244 views
  • 5 likes
  • 5 in conversation