BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
trevand
Obsidian | Level 7

I have a data set where I want to count number of distinct var_id when ind=1. I use proc sql with distinct option for that. See code below. However when I first remove the duplicates on var_id and then use proc sql with distinct option I get a different answer. Shouldn't they give the same answer since the distinct option takes care of this?

 

proc sort data=temp1 out=temp2 nodupkey;

by var_id;

run;

 

proc sql;

select count(distinct case when (ind=1) then var_id end)

from temp1;

quit;

 

proc sql;

select count(distinct case when (ind=1) then var_id end)

from temp2;

quit;

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

No.  Because the PROC SORT could have eliminated some observations with IND=1.

data temp1;
  input var_id ind ;
cards;
1 1
1 2
1 3
2 2
2 1
3 2
;

proc sort data=temp1 out=temp2 nodupkey;
  by var_id;
run;

proc sql;
  select 'TEMP1' as source, count(distinct case when (ind=1) then var_id end) as count from temp1
  union
  select 'TEMP2' as source, count(distinct case when (ind=1) then var_id end) as count from temp2
  ;
quit;

Tom_0-1762881658200.png

 

View solution in original post

1 REPLY 1
Tom
Super User Tom
Super User

No.  Because the PROC SORT could have eliminated some observations with IND=1.

data temp1;
  input var_id ind ;
cards;
1 1
1 2
1 3
2 2
2 1
3 2
;

proc sort data=temp1 out=temp2 nodupkey;
  by var_id;
run;

proc sql;
  select 'TEMP1' as source, count(distinct case when (ind=1) then var_id end) as count from temp1
  union
  select 'TEMP2' as source, count(distinct case when (ind=1) then var_id end) as count from temp2
  ;
quit;

Tom_0-1762881658200.png