Have been putting out fires that resulted from using a Proc SQL join instead of a SAS data step merge. I'm wondering if I have found a bug in SAS PROC SQL, as this merge did not behave as I would have expected, and did not behave as it would have if I was writing SQL code in SSMS. Example offending code: proc sql;
create table fup_timing as
select a.*,
t.Procedure_Date as t_Proc_Date label = "Followup Timing: Date Procedure from Proc Form",
t.Schedule as t_ScheduleCat label = "Followup Timing: Standard vs. Specialized",
from followup as a
left join followup_timing as t
on a.FormID_FUP= t.FormID_FUP;
quit; There are some rows with missing FORMID_FUP in the followup data by design. Because there are also missing FORMID_FUPs in followup_timing, we are getting an unexpected merge where N rows from followup_timing are being pulled over and merged on missing id with followup. This is causing duplication of the rows in followup that are missing the ID, as well as erroneous data being merged in from followup_timing for those rows. By doing a simple SAS merge, I get the behavior I expect where I have a left side merge that only pulls over valid values for FORMID_FUP that are in both followup and followup_timing data followup_test;
set followup;
run;
data fuptim;
set followup_timing;
run;
proc sort data=followup_test;
by FORMID_FUP;
run;
proc sort data=fuptim;
by FORMID_FUP;
run;
data rslt;
merge followup_test (in=a)
fuptim (in=b);
by FORMID_FUP;
if a;
run; I work with some talented web developers who write more SQL than I do, and even their suggestion of turning the PROC SQL join into an inner join unexpectedly merges on the missing values and results in duplication of rows. Is this a bug? Or is this some fundamental misunderstanding on our part of how SAS handles missing KEY_ID values?
... View more