Solved: Proc SQL Distinct Gives different Results

trevand · Posted 11-11-2025 11:46 AM

I have a data set where I want to count number of distinct var_id when ind=1. I use proc sql with distinct option for that. See code below. However when I first remove the duplicates on var_id and then use proc sql with distinct option I get a different answer. Shouldn't they give the same answer since the distinct option takes care of this?

proc sort data=temp1 out=temp2 nodupkey;

by var_id;

run;

proc sql;

select count(distinct case when (ind=1) then var_id end)

from temp1;

quit;

proc sql;

select count(distinct case when (ind=1) then var_id end)

from temp2;

quit;

Tom · Posted 11-11-2025 12:21 PM

No. Because the PROC SORT could have eliminated some observations with IND=1.

data temp1;
  input var_id ind ;
cards;
1 1
1 2
1 3
2 2
2 1
3 2
;

proc sort data=temp1 out=temp2 nodupkey;
  by var_id;
run;

proc sql;
  select 'TEMP1' as source, count(distinct case when (ind=1) then var_id end) as count from temp1
  union
  select 'TEMP2' as source, count(distinct case when (ind=1) then var_id end) as count from temp2
  ;
quit;

View solution in original post

Tom · Posted 11-11-2025 12:21 PM

No. Because the PROC SORT could have eliminated some observations with IND=1.

data temp1;
  input var_id ind ;
cards;
1 1
1 2
1 3
2 2
2 1
3 2
;

proc sort data=temp1 out=temp2 nodupkey;
  by var_id;
run;

proc sql;
  select 'TEMP1' as source, count(distinct case when (ind=1) then var_id end) as count from temp1
  union
  select 'TEMP2' as source, count(distinct case when (ind=1) then var_id end) as count from temp2
  ;
quit;

Proc SQL Distinct Gives different Results

Re: Proc SQL Distinct Gives different Results

Re: Proc SQL Distinct Gives different Results

Catch up on SAS Innovate 2026