I would like to have a variable that tells me which observations have merged and which have not. I would like to create this in the merge step. I can do it when I use merge within the data step. I can't however figure out how I would do it with a proc sql merge. Could anyone help me please? Here is the code that creates this variable in the data step.
data data_merged; merge data_1 (in=a) data_2 (in=b); by merge_id; data1=a; data2=b; if data1=1 & data2=1 then merge=3; else if data1=0 & data2=1 then merge=2; else if data1=1 & data2=0 then merge=1; run;
Try
proc sql;
create table joined as
select
/* do NOT use asterisks, it causes a WARNING */
/* use an exhaustive list of variables with suitable aliases instead */
, case
when not missing(t1.i) and not missing(t2.id) then 3
when missing(t1.id) and not missing(t2.id) then 2
when not missing(t1.id) and missing(t2.id) then 1
else 0
end as indicator
from data_1 t1 full join data_2 t2
on t1.id = t2.id
;
quit;
I specifically make no reference to MERGE, as this is a SQL join, which is different from a DATA step MERGE.
You should share your SQL JOIN (not merge, not the same thing) code that you want to apply this to.
you can add variable to any data set at join time by using something like
From (select *, 1 as in_a from somedataset) as a
@ballardwthis is the code I am using for the proc sql merge. How would you add the merge variable?
proc sql; create table merge_data as select t1.*, t2.* from data_1 t1 left join data_2 t2 on t1.id=t2.id;
proc sql; create table merge_data as select t1.*, t2.* from (select *, 1 as data1 from data_1 ) t1 left join (select *, 1 as data2 from data_2) t2 on t1.id=t2.id;
However with a LEFT join there the variable Data1 will always be 1.
Note that the outer Select, where you have select t1.*, t2.* doesn't need to actually keep data1 and data2 but the variables can be used in a Case clause to set the "merge" status you want.
@trevand wrote:
@ballardwthis is the code I am using for the proc sql merge. How would you add the merge variable?
proc sql; create table merge_data as select t1.*, t2.* from data_1 t1 left join data_2 t2 on t1.id=t2.id;
@ballardwhow would you add the final merge variable to the proc sql code?
if data1=1 & data2=1 then merge=3; else if data1=0 & data2=1 then merge=2; else if data1=1 & data2=0 then merge=1;
With a Left or Right join one of those conditions NEVER appears. With a left join you can never have an observations from the "right" data set without a matching "left". Same with a right joint only the order of which can't appear reverses.
LEFT join (on some conditios) explicitly says : Return all the observations from the "left" set and the matches that occur in the right set when present.
Try
proc sql;
create table joined as
select
/* do NOT use asterisks, it causes a WARNING */
/* use an exhaustive list of variables with suitable aliases instead */
, case
when not missing(t1.i) and not missing(t2.id) then 3
when missing(t1.id) and not missing(t2.id) then 2
when not missing(t1.id) and missing(t2.id) then 1
else 0
end as indicator
from data_1 t1 full join data_2 t2
on t1.id = t2.id
;
quit;
I specifically make no reference to MERGE, as this is a SQL join, which is different from a DATA step MERGE.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.