I have a data set where I want to count number of distinct var_id when ind=1. I use proc sql with distinct option for that. See code below. However when I first remove the duplicates on var_id and then use proc sql with distinct option I get a different answer. Shouldn't they give the same answer since the distinct option takes care of this?
proc sort data=temp1 out=temp2 nodupkey;
by var_id;
run;
proc sql;
select count(distinct case when (ind=1) then var_id end)
from temp1;
quit;
proc sql;
select count(distinct case when (ind=1) then var_id end)
from temp2;
quit;