I want to know what percent of clients had more than one stay for each client. I am trying to calculate percent of the group total where the group total is the distinct count of clients in the group (clinic). The code below is calculating percent of the grand total for each row. How can I fix this?
Thank you!
data have;
input client_id$ Span_Begin :DATE9. Span_End :DATE9. clinic$ stay$ cost R$ year$ clinic_type$;
format Span_Begin MMDDYY10. Span_End MMDDYY10. ;
datalines;
A 17Nov2022 04Jan2023 8 1 0 1 2023 1
B 01Jul2022 29Jul2022 12 1 2746 1 2023 3
B 21Nov2022 04Apr2023 12 2 1373 1 2023 3
C 01Jul2022 07Jul2022 8 1 2403 1 2023 3
C 12Jul2022 25Jul2022 6 2 343 1 2023 3
C 26Aug2022 15May2023 10 3 687 1 2023 3
D 01Jul2022 12Jul2022 10 1 5479 0 2023 2
E 01Jul2022 23Mar2023 9 1 159000 1 2023 3
F 22Nov2022 19Mar2023 8 1 6522 1 2023 3
G 26Aug2022 15May2023 12 1 687 1 2023 3
H 01Oct2022 10Apr2023 6 1 5479 1 2023 3
I 30Dec2022 19Jan2023 5 1 8675 1 2023 3
J 01Jul2022 01Nov2022 3 1 4566 1 2023 3
K 11Nov2022 10Jan2023 3 2 10044 1 2023 3
L 01Jul2022 15Jul2022 3 1 9588 1 2023 3
L 26Jul2022 17Aug2022 12 2 2060 0 2023 3
L 23Sep2022 19Feb2023 12 3 5149 1 2023 3
O 20Sep2022 20Feb2023 5 1 5022 1 2023 3
P 01Jul2022 02Aug2022 5 1 913 1 2023 1
Q 06Jul2022 01Sep2022 8 1 343 1 2023 3
R 01Jul2022 01Sep2022 1 1 457 1 2023 3
S 01Jul2022 24Oct2022 2 1 10957 1 2023 3
T 01Jul2022 04Dec2022 2 1 1826 1 2023 3
U 28Aug2022 18Sep2022 8 1 6179 1 2023 3
V 19Dec2022 31May2023 7 2 10501 1 2023 3
W 08Jun2023 18Jun2023 7 3 5022 1 2023 3
X 01Jul2022 10Jul2022 8 1 3433 0 2023 3
Y 04Aug2022 17Nov2022 10 2 7305 1 2023 3
Z 13Oct2022 07Nov2022 8 1 2403 1 2023 3
Z 09Feb2023 30Jun2023 12 2 5149 1 2023 3
AA 08Sep2022 04Oct2022 8 1 2746 1 2023 3
BB 01Jul2022 26Apr2023 2 1 10501 1 2023 3
CC 09Aug2022 19Jun2023 12 1 1373 1 2023 3
DD 01Jul2022 21Aug2022 10 1 9588 1 2023 3
DD 17Oct2022 13Jun2023 11 2 5935 1 2023 3
EE 01Jul2022 15Jul2022 5 1 3196 1 2023 3
EE 26Jul2022 17Aug2022 5 2 7305 1 2023 3
EE 23Sep2022 19Feb2023 4 3 13697 1 2023 3
;;;
run;
*** distinct client count per clinic****;
proc sql;
select clinic, count(distinct client_id) as pct_of from have
where clinic_type eq '3' and year eq '2023' and R eq '1' and cost>0
group by clinic;
run;
Title1 “Percent of clients with >1 stay”;
proc sql ;
select distinct clinic, count( distinct client_id) as Clients, calculated clients / (select count(distinct client_id) from have
where clinic_type eq '3' and year eq '2023' and R eq '1' and cost>0) as pct format=percent8.1
from have
where clinic_type eq '3' and year eq '2023' and R eq '1' and cost>0 and Stay ne '1'
group clinic
union all
select "Total", count( distinct client_id) as Total from have
where clinic_type eq '3' and year eq '2023' and R eq '1' and cost>0 and Stay ne '1';
quit;
I'm getting this:
I want this:
You seem to want to
1) Filter by some selection criteria
2) Count how many stays (across any clinic) per client
3) Count number of clients per clinic and how many of those clients had multiple stays.
So you want something like:
data subset;
set have;
where clinic_type=3 and year=2023 and r=1 and cost>0 ;
run;
proc sql;
create table stays_per_client as
select *,client_id,count(*) as n_stays
from subset
group by client_id
;
create table want as
select clinic
, count(*) as client_count
, sum(n_stays>1) as clients
, mean(n_stays>1) as pct format=percent.
from stays_per_client
group by clinic
;
quit;
Which results in this report:
So you could collapse that into a single SQL query by nesting it.
proc sql;
create table want as
select clinic
, count(*) as client_count
, sum(n_stays>1) as clients
, mean(n_stays>1) as pct format=percent.
from (
select clinic,client_id,count(*) as n_stays
from have
where clinic_type=3 and year=2023
and r=1 and cost>0
group by client_id
)
group by clinic
;
quit;
Note that this query is taking advantage of the fact that SAS will remerge summary statistics back onto the detail observations when you do something like:
select id,var2,mean(x) as mean_x
from have
group by id
If you want to run this SQL in some other system's implementation of SQL you will have to explicitly to the merge (join in SQL speak) yourself. Make sure to apply the same WHERE clause to both sub queries. And perhaps replace the boolean expressions with CASE clauses.
proc sql;
create table want as
select a.clinic
, count(*) as client_count
, sum(case when n_stays>1 then 1 else 0 end) as clients
, mean(case when n_stays>1 then 1 else 0 end) as pct format=percent.
from (select clinic,client_id
from have
where clinic_type=3 and year=2023
and r=1 and cost>0
) a
inner join
(select client_id,count(*) as n_stays
from have
where clinic_type=3 and year=2023
and r=1 and cost>0
group by client_id
) b
on a.client_id=b.client_id
group by a.clinic
;
quit;
In my opinion, SQL is a poor choice for this. PROC FREQ (or PROC SUMMARY) would do the job easily.
I don't understand the math that results in Clinic 10 having 2 clients which is 66.7%, I can't seem to figure out how you get that from the raw data.
I used proc sql because it is the only way I know how to get distinct count. The first proc SQL gives you the distinct count of total clients per clinic. When looking at distinct clients, 2 of the 3 distinct clients in clinic 10 had more than 1 stay in a clinic (66.7% of the distinct clients in clinic 10).
I see 4 distinct clients (C D Y and DD) in clinic 10 in your raw data.
Yes, but only 3 are clinic type 3, which is the only clinic type I want to count.
where clinic_type eq '3' and year eq '2023' and R eq '1' and cost>0)
@Whitlea wrote:
Yes, but only 3 are clinic type 3, which is the only clinic type I want to count.
where clinic_type eq '3' and year eq '2023' and R eq '1' and cost>0)
Thanks. As far as I can tell, this was not stated in the text of the original problem statement. If the text of the problem statement doesn't mention it but your sample code does mention it, this is a contradiction between the problem statement and sample code (something you should avoid) and normally I stop and ask questions. Better to do that than to go ahead and solve a problem I don't understand.
I have switched the variable STAY in your data set from character to numeric. It is always better to make counts (and other numeric things) a numeric variable. If I don't do this, then I can't use the WEIGHT statement in PROC FREQ.
proc sort data=have(where=(clinic_type='3')) out=have1;
by clinic;
run;
proc freq data=have1; /* Do the counting */
by clinic;
tables client_id/noprint out=_counts_;
weight stay;
run;
data want; /* Find percent more than one stay */
set _counts_(drop=percent);
by clinic;
if first.clinic then do;
more_than_once=0; n=0;
end;
n+1;
if count>1 then more_than_once+1; /* Count times client stayed more than once */
if last.clinic then do;
more_than_once_percent=more_than_once/n;
output;
end;
run;
Count the number of stays per client first. Then since you will already have distinct clients counting will not require the DISTINCT keyword.
You seem to want to
1) Filter by some selection criteria
2) Count how many stays (across any clinic) per client
3) Count number of clients per clinic and how many of those clients had multiple stays.
So you want something like:
data subset;
set have;
where clinic_type=3 and year=2023 and r=1 and cost>0 ;
run;
proc sql;
create table stays_per_client as
select *,client_id,count(*) as n_stays
from subset
group by client_id
;
create table want as
select clinic
, count(*) as client_count
, sum(n_stays>1) as clients
, mean(n_stays>1) as pct format=percent.
from stays_per_client
group by clinic
;
quit;
Which results in this report:
So you could collapse that into a single SQL query by nesting it.
proc sql;
create table want as
select clinic
, count(*) as client_count
, sum(n_stays>1) as clients
, mean(n_stays>1) as pct format=percent.
from (
select clinic,client_id,count(*) as n_stays
from have
where clinic_type=3 and year=2023
and r=1 and cost>0
group by client_id
)
group by clinic
;
quit;
Note that this query is taking advantage of the fact that SAS will remerge summary statistics back onto the detail observations when you do something like:
select id,var2,mean(x) as mean_x
from have
group by id
If you want to run this SQL in some other system's implementation of SQL you will have to explicitly to the merge (join in SQL speak) yourself. Make sure to apply the same WHERE clause to both sub queries. And perhaps replace the boolean expressions with CASE clauses.
proc sql;
create table want as
select a.clinic
, count(*) as client_count
, sum(case when n_stays>1 then 1 else 0 end) as clients
, mean(case when n_stays>1 then 1 else 0 end) as pct format=percent.
from (select clinic,client_id
from have
where clinic_type=3 and year=2023
and r=1 and cost>0
) a
inner join
(select client_id,count(*) as n_stays
from have
where clinic_type=3 and year=2023
and r=1 and cost>0
group by client_id
) b
on a.client_id=b.client_id
group by a.clinic
;
quit;
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.