I have a long code, but made a mock one for our purposes here, which creates the same error. I have 5 out of 10 patients that were evaluated, yet I get 60% instead of 50% evaluated when using "else 0". But when I use "else .", I get 50%, which is correct. I'm wondering why? Problem solved after 2 days, but perhaps someone can help me understand why using else 0 vs. else . would make a difference?
Mock data:
data test;
input patient_id evaluated $;
datalines;
1 Yes
2 Yes
3 Yes
4 Yes
5 Yes
6 No
7 No
8 No
9 No
10 No
;
run;
First code that gives me 60% with "else 0" in the second subquery:
proc sql;
create table test1 as
select "Evaluation Complete" as Variable,
(select (Count(distinct patient_id))
from test
where evaluated = "Yes") as Count,
(select (Count(distinct case when evaluated="Yes" then patient_id else 0 end)/(Count(distinct patient_id)*1.0))
from test) as Percent format percent8.1
from test(obs=1);
quit;
Second code where we replace "else 0" with "else ." and gives us 50% for patients evaluated (correct):
proc sql;
create table test2 as
select "Evaluation Complete" as Variable,
(select (Count(distinct patient_id))
from test
where evaluated = "Yes") as Count,
(select (Count(distinct case when evaluated="Yes" then patient_id else . end)/(Count(distinct patient_id)*1.0))
from test) as Percent format percent8.1
from test(obs=1);
quit;
You're assigning the value of 0 to all ones you don't want to count, which is the extra value so you get 6 distinct values (5 actual IDs plus the 0).
Missing is not counted so you get 5 values.
proc sql;
create table test1 as
select "Evaluation Complete" as Variable,
*, case when evaluated="Yes" then patient_id else 0 end as variable0,
case when evaluated="Yes" then patient_id else . end as variable_missing
from test;
quit;
@vegan_renegade wrote:
I have a long code, but made a mock one for our purposes here, which creates the same error. I have 5 out of 10 patients that were evaluated, yet I get 60% instead of 50% evaluated when using "else 0". But when I use "else .", I get 50%, which is correct. I'm wondering why? Problem solved after 2 days, but perhaps someone can help me understand why using else 0 vs. else . would make a difference?
Mock data:
data test; input patient_id evaluated $; datalines; 1 Yes 2 Yes 3 Yes 4 Yes 5 Yes 6 No 7 No 8 No 9 No 10 No ; run;
First code that gives me 60% with "else 0" in the second subquery:
proc sql; create table test1 as select "Evaluation Complete" as Variable, (select (Count(distinct patient_id)) from test where evaluated = "Yes") as Count, (select (Count(distinct case when evaluated="Yes" then patient_id else 0 end)/(Count(distinct patient_id)*1.0)) from test) as Percent format percent8.1 from test(obs=1); quit;
Second code where we replace "else 0" with "else ." and gives us 50% for patients evaluated (correct):
proc sql; create table test2 as select "Evaluation Complete" as Variable, (select (Count(distinct patient_id)) from test where evaluated = "Yes") as Count, (select (Count(distinct case when evaluated="Yes" then patient_id else . end)/(Count(distinct patient_id)*1.0)) from test) as Percent format percent8.1 from test(obs=1); quit;
You're assigning the value of 0 to all ones you don't want to count, which is the extra value so you get 6 distinct values (5 actual IDs plus the 0).
Missing is not counted so you get 5 values.
proc sql;
create table test1 as
select "Evaluation Complete" as Variable,
*, case when evaluated="Yes" then patient_id else 0 end as variable0,
case when evaluated="Yes" then patient_id else . end as variable_missing
from test;
quit;
@vegan_renegade wrote:
I have a long code, but made a mock one for our purposes here, which creates the same error. I have 5 out of 10 patients that were evaluated, yet I get 60% instead of 50% evaluated when using "else 0". But when I use "else .", I get 50%, which is correct. I'm wondering why? Problem solved after 2 days, but perhaps someone can help me understand why using else 0 vs. else . would make a difference?
Mock data:
data test; input patient_id evaluated $; datalines; 1 Yes 2 Yes 3 Yes 4 Yes 5 Yes 6 No 7 No 8 No 9 No 10 No ; run;
First code that gives me 60% with "else 0" in the second subquery:
proc sql; create table test1 as select "Evaluation Complete" as Variable, (select (Count(distinct patient_id)) from test where evaluated = "Yes") as Count, (select (Count(distinct case when evaluated="Yes" then patient_id else 0 end)/(Count(distinct patient_id)*1.0)) from test) as Percent format percent8.1 from test(obs=1); quit;
Second code where we replace "else 0" with "else ." and gives us 50% for patients evaluated (correct):
proc sql; create table test2 as select "Evaluation Complete" as Variable, (select (Count(distinct patient_id)) from test where evaluated = "Yes") as Count, (select (Count(distinct case when evaluated="Yes" then patient_id else . end)/(Count(distinct patient_id)*1.0)) from test) as Percent format percent8.1 from test(obs=1); quit;
I think the issue is that COUNT will count any valid value.
Try:
proc sql; create table test1 as select "Evaluation Complete" as Variable, (select (Count(distinct patient_id)) from test where evaluated = "Yes") as Count, (select (Count(distinct case when evaluated="Yes" then patient_id else . end)/(Count(distinct patient_id)*1.0)) from test) as Percent format percent8.1 from test(obs=1); quit;
Your example doesn't include any repeated patient_id. If your data doesn't have any such I would question why do this instead of Proc Freq.
I would advise you to use PROC FREQ whenever you want to count things, instead of creating your own somewhat complicated SQL method, which clearly didn't work. SAS has put a huge amount of effort into validating the PROC FREQ results and tested it in a bazillion real world situations so you don't have to worry about getting the wrong output.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.