BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SURIM
Obsidian | Level 7

Hi!

 

This question probably has an easy answer, but I can't get my head around it.

 

I want to merge table A and table B, by a key id.

There will be a lot of duplicate rows in both tables.

 

If a key id is in table A, I want to delete all key id:s in table B in the merge. If the key id is only in table A or only in table B, they should stay.

 

Example:

Table A Table B 
key_iddatekey_iddate
123452018-07-15123452017-08-14
123452018-12-06123452018-01-09
848812019-01-01445662018-03-05

 

 

Expected outcome after merge:

Table_all 
key_iddate
123452018-07-15
123452018-12-06
848812019-01-01
445662018-03-05

 

 

Thoughts?

 

Kind Regards,

Susannah

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Let's do it in steps as you defined it:

 

1) "If a key id is in table A, I want to delete all key id:s in table B in the merge" = "If the key id is only in table B, they should stay.":

proc sql;
   create table want1 as select *
   from tableB where key_id not in (select distict key_id from tableA)
order by key_d;

2) Key_id in table A and not in table B

/* continued to previous code */

create table want2 as select * 
from tableA where key_id not in (select distinct key_id from tableB)
order by key_d;

3) final step:

quit;   /* quit sql */

data want;
 merge want1 want2;
  by key_id;
run;

View solution in original post

15 REPLIES 15
Patrick
Opal | Level 21

So with the sample data you've posted how would the desired result look like? A table with 2 columns and 0 rows?

 

Please try to post sample data which covers all use cases and then show us the desired result based on the sample data.

 

Also: Please post sample data as working data step code so we have something to work with without having to prepare everything first before actually dealing with your question. That is of course only if you'd like to get working and tested code from us and not only a description of how to do things.

SURIM
Obsidian | Level 7
Hello! I've updated with expected outcome. Thanks!
Kurt_Bremser
Super User

@SURIM wrote:
Hello! I've updated with expected outcome. Thanks!

The expected outcome does not match your rule AT ALL. Please review your logic. My code does what your logic says.

See Maxim 42.

SURIM
Obsidian | Level 7
Thanks for your post, I have updated the logic. Sorry about that I almost always use merge as a left join. No need to shame us SAS newbies. I am just trying to learn.
Kurt_Bremser
Super User

According to your revised logic:

"If a key id is in table A, I want to delete all key id:s in table B in the merge. If the key id is only in table A or only in table B, they should stay."

key_id 12345 should be removed, as it is present in both tables.

SURIM
Obsidian | Level 7
I tried this, but it deleted all of the id:s that were in both tables. That is why I think this is tricky...
Patrick
Opal | Level 21

The logic implemented by @Kurt_Bremser does what you describe (at least how I understand your description). If that's not what you want then please reformulate the logic required and show us the desired result based on the sample data so we have a better chance to understand what you need.

Jagadishkatam
Amethyst | Level 16

Please try the below untested code

 

proc sort data=tableA;
by key_id;
run;

proc sort data=tableB;
by key_id;
run;

data all;
merge tableA(in=a) tableB(in=b);
by key_id;
if b and not a;
run;


Thanks,
Jag
SURIM
Obsidian | Level 7
Sorry, I have updated with new logic. I think this might work if only changed a little bit.
Patrick
Opal | Level 21

@SURIM 

For the sample data you've posted a simple merge will return the desired result.

data have1;
  input key_id date :yymmdd10.;
  format date yymmdd10.;
datalines;	
12345 2018-07-15 
12345 2018-12-06 
84881 2019-01-01 
;

data have2;
  input key_id date :yymmdd10.;
  format date yymmdd10.;
datalines;	
12345 2017-08-14
12345 2018-01-09
44566 2018-03-05
;

data want;
  merge have2 (in=in2) have1 (in=in1);
  by key_id;
run;

proc print;
run;

Capture.JPG

 

However: Things won't work with above code if for the same key there are more rows in Have2 than in Have1. Is this possible?

...and that's why I'm asking for representative sample data as this allows us to provide code to you even if your description of the required logic isn't perfect.

 

So, is below possible in your data? And if possible do I understand right that you still only would want 2 rows for key 12345 as a result?

data have1;
  input key_id date :yymmdd10.;
  format date yymmdd10.;
datalines;	
12345 2018-07-15 
12345 2018-12-06 
84881 2019-01-01 
;

data have2;
  input key_id date :yymmdd10.;
  format date yymmdd10.;
datalines;	
12345 2017-08-14
12345 2018-01-09
12345 2019-01-01
44566 2018-03-05
;

 

...and here the code if above is a possible use case.

proc sql; 
  create table want as
    select *
    from have1
    union corr all
    select * 
    from have2 h2
    where not exists
      ( select *
        from have1 h1
        where h1.key_id = h2.key_id
      )
  ;
quit;

 

s_lassen
Meteorite | Level 14

One way to get only the original rows from a, even if there are more rows in table B than in table A could be this:

data have1;
  input key_id date :yymmdd10.;
  format date yymmdd10.;
datalines;	
12345 2018-07-15 
12345 2018-12-06 
84881 2019-01-01 
;run;

data have2;
  input key_id date :yymmdd10.;
  format date yymmdd10.;
datalines;	
12345 2017-08-14
12345 2018-01-09
12345 2018-01-10
44566 2018-03-05
;run;

data want;
  a=0; 
  merge have1(in=a) have2(in=b);
  by key_id;
  if first.key_id then do;
    if a then want='a';
	else want='b';
	end;
  retain want;
  if want='a' then
    if a;
  drop want;
run;
Shmuel
Garnet | Level 18

Let's do it in steps as you defined it:

 

1) "If a key id is in table A, I want to delete all key id:s in table B in the merge" = "If the key id is only in table B, they should stay.":

proc sql;
   create table want1 as select *
   from tableB where key_id not in (select distict key_id from tableA)
order by key_d;

2) Key_id in table A and not in table B

/* continued to previous code */

create table want2 as select * 
from tableA where key_id not in (select distinct key_id from tableB)
order by key_d;

3) final step:

quit;   /* quit sql */

data want;
 merge want1 want2;
  by key_id;
run;
Shmuel
Garnet | Level 18
May be you prefer
order by key_id and date ?!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2038 views
  • 0 likes
  • 6 in conversation