Hi!
This question probably has an easy answer, but I can't get my head around it.
I want to merge table A and table B, by a key id.
There will be a lot of duplicate rows in both tables.
If a key id is in table A, I want to delete all key id:s in table B in the merge. If the key id is only in table A or only in table B, they should stay.
Example:
Table A | Table B | ||
key_id | date | key_id | date |
12345 | 2018-07-15 | 12345 | 2017-08-14 |
12345 | 2018-12-06 | 12345 | 2018-01-09 |
84881 | 2019-01-01 | 44566 | 2018-03-05 |
Expected outcome after merge:
Table_all | |
key_id | date |
12345 | 2018-07-15 |
12345 | 2018-12-06 |
84881 | 2019-01-01 |
44566 | 2018-03-05 |
Thoughts?
Kind Regards,
Susannah
Let's do it in steps as you defined it:
1) "If a key id is in table A, I want to delete all key id:s in table B in the merge" = "If the key id is only in table B, they should stay.":
proc sql;
create table want1 as select *
from tableB where key_id not in (select distict key_id from tableA)
order by key_d;
2) Key_id in table A and not in table B
/* continued to previous code */
create table want2 as select *
from tableA where key_id not in (select distinct key_id from tableB)
order by key_d;
3) final step:
quit; /* quit sql */
data want;
merge want1 want2;
by key_id;
run;
So with the sample data you've posted how would the desired result look like? A table with 2 columns and 0 rows?
Please try to post sample data which covers all use cases and then show us the desired result based on the sample data.
Also: Please post sample data as working data step code so we have something to work with without having to prepare everything first before actually dealing with your question. That is of course only if you'd like to get working and tested code from us and not only a description of how to do things.
@SURIM wrote:
Hello! I've updated with expected outcome. Thanks!
The expected outcome does not match your rule AT ALL. Please review your logic. My code does what your logic says.
See Maxim 42.
According to your revised logic:
"If a key id is in table A, I want to delete all key id:s in table B in the merge. If the key id is only in table A or only in table B, they should stay."
key_id 12345 should be removed, as it is present in both tables.
proc sql;
create table want as
select * from table_b
where table_b.key_id not in (select distinct key_id from table_a)
;
quit;
The logic implemented by @Kurt_Bremser does what you describe (at least how I understand your description). If that's not what you want then please reformulate the logic required and show us the desired result based on the sample data so we have a better chance to understand what you need.
Please try the below untested code
proc sort data=tableA;
by key_id;
run;
proc sort data=tableB;
by key_id;
run;
data all;
merge tableA(in=a) tableB(in=b);
by key_id;
if b and not a;
run;
For the sample data you've posted a simple merge will return the desired result.
data have1;
input key_id date :yymmdd10.;
format date yymmdd10.;
datalines;
12345 2018-07-15
12345 2018-12-06
84881 2019-01-01
;
data have2;
input key_id date :yymmdd10.;
format date yymmdd10.;
datalines;
12345 2017-08-14
12345 2018-01-09
44566 2018-03-05
;
data want;
merge have2 (in=in2) have1 (in=in1);
by key_id;
run;
proc print;
run;
However: Things won't work with above code if for the same key there are more rows in Have2 than in Have1. Is this possible?
...and that's why I'm asking for representative sample data as this allows us to provide code to you even if your description of the required logic isn't perfect.
So, is below possible in your data? And if possible do I understand right that you still only would want 2 rows for key 12345 as a result?
data have1;
input key_id date :yymmdd10.;
format date yymmdd10.;
datalines;
12345 2018-07-15
12345 2018-12-06
84881 2019-01-01
;
data have2;
input key_id date :yymmdd10.;
format date yymmdd10.;
datalines;
12345 2017-08-14
12345 2018-01-09
12345 2019-01-01
44566 2018-03-05
;
...and here the code if above is a possible use case.
proc sql;
create table want as
select *
from have1
union corr all
select *
from have2 h2
where not exists
( select *
from have1 h1
where h1.key_id = h2.key_id
)
;
quit;
One way to get only the original rows from a, even if there are more rows in table B than in table A could be this:
data have1;
input key_id date :yymmdd10.;
format date yymmdd10.;
datalines;
12345 2018-07-15
12345 2018-12-06
84881 2019-01-01
;run;
data have2;
input key_id date :yymmdd10.;
format date yymmdd10.;
datalines;
12345 2017-08-14
12345 2018-01-09
12345 2018-01-10
44566 2018-03-05
;run;
data want;
a=0;
merge have1(in=a) have2(in=b);
by key_id;
if first.key_id then do;
if a then want='a';
else want='b';
end;
retain want;
if want='a' then
if a;
drop want;
run;
Let's do it in steps as you defined it:
1) "If a key id is in table A, I want to delete all key id:s in table B in the merge" = "If the key id is only in table B, they should stay.":
proc sql;
create table want1 as select *
from tableB where key_id not in (select distict key_id from tableA)
order by key_d;
2) Key_id in table A and not in table B
/* continued to previous code */
create table want2 as select *
from tableA where key_id not in (select distinct key_id from tableB)
order by key_d;
3) final step:
quit; /* quit sql */
data want;
merge want1 want2;
by key_id;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.