How can we flag the source of each observations after merging two datasets using PROC SQL? The flag variable should identify whether the observation belongs to both datasets or any of them.
You need to use the COALESCE function for your key variable(s) in the SELECT.
select
coalesce(a.key,b.key) as key,
.....
Use CASE logic:
proc sql;
select case
when not missing(A.column) and not missing(B.column) then 'BOTH'
when not missing(A.column) then 'FROMA'
when not missing(B.column) then 'FROMB'
else ' '
end as Source_Flag length = 5
from table1 as A
outer join table2 as B
....;
quit;
The easiest way is to use normal SAS code to merge instead since it actually has a concept of source. So code like this will create a FLAG variable with values of 1, 2 or 3 (both).
data want;
merge a(in=in1) b(in=in2);
flag = 2*in2 + in1 ;
run;
You could do something similar in SQL code with a little work.
Say you had this SQL code:
proc sql;
create table want as
select ...variable list...
from A
full join B
on (...criteria...)
;
You could add those IN1 and IN2 flags and the create the new FLAG variable the same way like this:
create table want as
select ...variable list...
,case when (in1 and in2) then 3 when (in1) then 1 else 2 end as FLAG
from (select *,1 as in1 from A ) A
full join (select *,1 as in2 from B) B
on (...criteria...)
;
You need the CASE instead of the simple arithmetic because the IN1 and IN2 variables will be coded as 1 and missing instead of the 1 and 0 that they would have had when created by the IN= dataset option.
Thanks for the code.
When I ran the last set of code with PRC SQL, the flag variable is created ok but the matching variable is missing when flag=2.
You need to use the COALESCE function for your key variable(s) in the SELECT.
select
coalesce(a.key,b.key) as key,
.....
The combined dataset now contains only the key variable.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.