Solved: Count with multiple criteria

Whitlea · Posted 08-01-2024 08:18 PM

I want to know how many clients in each clinic stayed at a different clinic and the cost of the stay at the different clinic. I am not considering order of the stays. I do not want to count a client if they have a different stay in the same clinic (see client L).

Thank you for your help!

data have;
input
client$	Span_Begin :DATE9.	Span_End :DATE9. clinic$ stay cost;
 

format Span_Begin MMDDYY10. Span_End MMDDYY10. ;

datalines;
A	17Nov2022	04Jan2023	8	1	1373
B	01Jul2022	29Jul2022	12	1	2746
B	21Nov2022	04Apr2023	12	2	1373
C	01Jul2022	07Jul2022	8	1	2403
D	01Jul2022	12Jul2022	10	1	5479
E	01Jul2022	23Mar2023	9	1	159000
F	22Nov2022	19Mar2023	8	1	6522
G	26Aug2022	15May2023	12	1	687
H	01Oct2022	10Apr2023	6	1	5479
I	30Dec2022	19Jan2023	5	1	8675
J	01Jul2022	01Nov2022	3	1	4566
K	11Nov2022	10Jan2023	3	2	10044
L	01Jul2022	15Jul2022	3	3	9588
L	26Jul2022	17Aug2022	12	1	2060
L	23Sep2022	19Feb2023	12	1	5149
O	20Sep2022	20Feb2023	5	1	5022
P	01Jul2022	02Aug2022	5	1	913
Q	06Jul2022	01Sep2022	8	1	343
R	01Jul2022	01Sep2022	1	1	457
S	01Jul2022	24Oct2022	2	1	10957
T	01Jul2022	04Dec2022	2	1	1826
U	28Aug2022	18Sep2022	8	1	6179
V	19Dec2022	31May2023	7	2	10501
W	08Jun2023	18Jun2023	7	3	5022
X	01Jul2022	10Jul2022	8	1	3433
Y	04Aug2022	17Nov2022	10	2	7305
Z	13Oct2022	07Nov2022	8	1	2403
Z	09Feb2023	30Jun2023	12	2	5149
AA	08Sep2022	04Oct2022	8	1	2746
BB	01Jul2022	26Apr2023	2	1	10501
CC	09Aug2022	19Jun2023	12	1	1373
DD	01Jul2022	21Aug2022	10	1	9588
DD	17Oct2022	13Jun2023	11	2	5935
EE	01Jul2022	15Jul2022	5	1	3196
EE	26Jul2022	17Aug2022	5	2	7305
EE	23Sep2022	19Feb2023	4	3	13697

;;;
run;

Here are the results from the above example:

Clinic	Count	Cost
3	1	$9,588
4	1	$13,697
5	1	$10,501
8	1	$2,403
10	1	$9,588
11	1	$5,935
12	2	$12,358
Grand Total	8	$64,070

Ksharp · Posted 08-03-2024 04:28 AM

data have;
input client$	Span_Begin :DATE9.	Span_End :DATE9. clinic stay cost;
format Span_Begin MMDDYY10. Span_End MMDDYY10. ;
datalines;
A	17Nov2022	04Jan2023	8	1	1373
B	01Jul2022	29Jul2022	12	1	2746
B	21Nov2022	04Apr2023	12	2	1373
C	01Jul2022	07Jul2022	8	1	2403
D	01Jul2022	12Jul2022	10	1	5479
E	01Jul2022	23Mar2023	9	1	159000
F	22Nov2022	19Mar2023	8	1	6522
G	26Aug2022	15May2023	12	1	687
H	01Oct2022	10Apr2023	6	1	5479
I	30Dec2022	19Jan2023	5	1	8675
J	01Jul2022	01Nov2022	3	1	4566
K	11Nov2022	10Jan2023	3	2	10044
L	01Jul2022	15Jul2022	3	3	9588
L	26Jul2022	17Aug2022	12	1	2060
L	23Sep2022	19Feb2023	12	1	5149
O	20Sep2022	20Feb2023	5	1	5022
P	01Jul2022	02Aug2022	5	1	913
Q	06Jul2022	01Sep2022	8	1	343
R	01Jul2022	01Sep2022	1	1	457
S	01Jul2022	24Oct2022	2	1	10957
T	01Jul2022	04Dec2022	2	1	1826
U	28Aug2022	18Sep2022	8	1	6179
V	19Dec2022	31May2023	7	2	10501
W	08Jun2023	18Jun2023	7	3	5022
X	01Jul2022	10Jul2022	8	1	3433
Y	04Aug2022	17Nov2022	10	2	7305
Z	13Oct2022	07Nov2022	8	1	2403
Z	09Feb2023	30Jun2023	12	2	5149
AA	08Sep2022	04Oct2022	8	1	2746
BB	01Jul2022	26Apr2023	2	1	10501
CC	09Aug2022	19Jun2023	12	1	1373
DD	01Jul2022	21Aug2022	10	1	9588
DD	17Oct2022	13Jun2023	11	2	5935
EE	01Jul2022	15Jul2022	5	1	3196
EE	26Jul2022	17Aug2022	5	2	7305
EE	23Sep2022	19Feb2023	4	3	13697
;;;
run;

proc sql;
create table temp as
select distinct clinic,client from have;

create table temp2 as
select a.clinic,a.client,b.clinic as _clinic,b.cost as cost
 from temp as a,have as b
  where a.client=b.client and a.clinic ne b.clinic;

create table want1 as
select clinic,count(distinct client) as count,sum(cost) as sum_of_cost
 from temp2
  group by clinic;

create table temp3 as
select distinct clinic,_clinic from temp2;
quit;

data want2;
do until(last.clinic);
 set temp3;
 by clinic;
 length other_clinic $ 200;
 other_clinic=catx(',',other_clinic,_clinic);
end;
keep clinic other_clinic;
run;


data final_want;
 merge want1 want2;
 by clinic;
run;

View solution in original post

mkeintz · Posted 08-01-2024 09:10 PM

Thank you for providing working code to create dataset HAVE.

You "want to know how many clients in each clinic stayed at a different clinic and the cost of the stay at the different clinic."

Your desired result for clinic 3 shows a COUNT of 1 and a COST of 9,588.

Now the only client for clinic 3 that has stays at other clinics client L, which explains why you have a COUNT of 1. But, with the following data for L

Client	Span_Begin	Span_End	Clinic	Stay	Cost
L	07/01/2022	07/15/2022	1	3	9,588
L	07/26/2022	08/17/2022	12	1	2,060
L	09/23/2022	02/19/2023	12	1	5,149

why does your sample desired result have COST of 9,588? Shouldn't it be 7,209 (2,060 + 5,149 for two visits to clinic 12)?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Whitlea · Posted 08-02-2024 11:11 AM

Yes, you are correct! I apologize. These should be the results:

Clinic	Count	Other Clinic	Sum of Cost
3	1	12	7209
4	1	5	10501
5	1	4	13697
8	1	12	5149
10	1	11	5935
11	1	10	9588
12	2	3,8	11991

Ksharp · Posted 08-03-2024 04:28 AM

data have;
input client$	Span_Begin :DATE9.	Span_End :DATE9. clinic stay cost;
format Span_Begin MMDDYY10. Span_End MMDDYY10. ;
datalines;
A	17Nov2022	04Jan2023	8	1	1373
B	01Jul2022	29Jul2022	12	1	2746
B	21Nov2022	04Apr2023	12	2	1373
C	01Jul2022	07Jul2022	8	1	2403
D	01Jul2022	12Jul2022	10	1	5479
E	01Jul2022	23Mar2023	9	1	159000
F	22Nov2022	19Mar2023	8	1	6522
G	26Aug2022	15May2023	12	1	687
H	01Oct2022	10Apr2023	6	1	5479
I	30Dec2022	19Jan2023	5	1	8675
J	01Jul2022	01Nov2022	3	1	4566
K	11Nov2022	10Jan2023	3	2	10044
L	01Jul2022	15Jul2022	3	3	9588
L	26Jul2022	17Aug2022	12	1	2060
L	23Sep2022	19Feb2023	12	1	5149
O	20Sep2022	20Feb2023	5	1	5022
P	01Jul2022	02Aug2022	5	1	913
Q	06Jul2022	01Sep2022	8	1	343
R	01Jul2022	01Sep2022	1	1	457
S	01Jul2022	24Oct2022	2	1	10957
T	01Jul2022	04Dec2022	2	1	1826
U	28Aug2022	18Sep2022	8	1	6179
V	19Dec2022	31May2023	7	2	10501
W	08Jun2023	18Jun2023	7	3	5022
X	01Jul2022	10Jul2022	8	1	3433
Y	04Aug2022	17Nov2022	10	2	7305
Z	13Oct2022	07Nov2022	8	1	2403
Z	09Feb2023	30Jun2023	12	2	5149
AA	08Sep2022	04Oct2022	8	1	2746
BB	01Jul2022	26Apr2023	2	1	10501
CC	09Aug2022	19Jun2023	12	1	1373
DD	01Jul2022	21Aug2022	10	1	9588
DD	17Oct2022	13Jun2023	11	2	5935
EE	01Jul2022	15Jul2022	5	1	3196
EE	26Jul2022	17Aug2022	5	2	7305
EE	23Sep2022	19Feb2023	4	3	13697
;;;
run;

proc sql;
create table temp as
select distinct clinic,client from have;

create table temp2 as
select a.clinic,a.client,b.clinic as _clinic,b.cost as cost
 from temp as a,have as b
  where a.client=b.client and a.clinic ne b.clinic;

create table want1 as
select clinic,count(distinct client) as count,sum(cost) as sum_of_cost
 from temp2
  group by clinic;

create table temp3 as
select distinct clinic,_clinic from temp2;
quit;

data want2;
do until(last.clinic);
 set temp3;
 by clinic;
 length other_clinic $ 200;
 other_clinic=catx(',',other_clinic,_clinic);
end;
keep clinic other_clinic;
run;


data final_want;
 merge want1 want2;
 by clinic;
run;

Whitlea · Posted 09-27-2024 01:45 PM

Thank you Ksharp, this gave me exactly what I needed! Now I am trying to make a table that counts stays in a different clinic that considers the chronological order of the stays. I also want to add a readmissions column where readmissions= the count of movements/placements into a different clinic. I was able to modify your code to give me the correct client count for this but my readmissions and cost are being double counted for clients who have 2 back-to-back stays in the same clinic, followed by a stay in a different clinic. For example, client EE's readmission into clinic 4 is being counted twice where I only want it to be counted once as they only moved into a different clinic one time (from clinic 5 to clinic 4). Is there a way to fix this so it doesn't duplicate these readmissions and cost?

data have;
input client$	Span_Begin :DATE9.	Span_End :DATE9. clinic stay cost;
format Span_Begin MMDDYY10. Span_End MMDDYY10. ;
datalines;
A	17Nov2022	04Jan2023	8	1	1373
B	01Jul2022	29Jul2022	12	1	2746
B	21Nov2022	04Apr2023	12	2	1373
C	01Jul2022	07Jul2022	8	1	2403
C   12Jul2022	25Jul2022   6   2   343
C   26Aug2022	15May2023   10  3   687
D	01Jul2022	12Jul2022	10	1	5479
E	01Jul2022	23Mar2023	9	1	159000
F	22Nov2022	19Mar2023	8	1	6522
G	26Aug2022	15May2023	12	1	687
H	01Oct2022	10Apr2023	6	1	5479
I	30Dec2022	19Jan2023	5	1	8675
J	01Jul2022	01Nov2022	3	1	4566
K	11Nov2022	10Jan2023	3	2	10044
L	01Jul2022	15Jul2022	3	3	9588
L	26Jul2022	17Aug2022	12	1	2060
L	23Sep2022	19Feb2023	12	1	5149
O	20Sep2022	20Feb2023	5	1	5022
P	01Jul2022	02Aug2022	5	1	913
Q	06Jul2022	01Sep2022	8	1	343
R	01Jul2022	01Sep2022	1	1	457
S	01Jul2022	24Oct2022	2	1	10957
T	01Jul2022	04Dec2022	2	1	1826
U	28Aug2022	18Sep2022	8	1	6179
V	19Dec2022	31May2023	7	2	10501
W	08Jun2023	18Jun2023	7	3	5022
X	01Jul2022	10Jul2022	8	1	3433
Y	04Aug2022	17Nov2022	10	2	7305
Z	13Oct2022	07Nov2022	8	1	2403
Z	09Feb2023	30Jun2023	12	2	5149
AA	08Sep2022	04Oct2022	8	1	2746
BB	01Jul2022	26Apr2023	2	1	10501
CC	09Aug2022	19Jun2023	12	1	1373
DD	01Jul2022	21Aug2022	10	1	9588
DD	17Oct2022	13Jun2023	11	2	5935
EE	01Jul2022	15Jul2022	5	1	3196
EE	26Jul2022	17Aug2022	5	2	7305
EE	23Sep2022	19Feb2023	4	3	13697
;;;
run;
proc sql;
create table temp as
select distinct Clinic,Client, span_begin from have;

create table temp2 as
select a.Clinic,a.Client,b.Clinic as _Clinic,b.Cost as Cost
 from temp as a,have as b
  where a.Client=b.Client and a.Clinic ne _Clinic and a.Span_Begin lt b.Span_Begin  ;

create table want1 as
select Clinic,count(distinct Client) as count, count(client)as readmissions, sum(Cost) as sum_of_Cost
 from temp2
  group by Clinic;

create table temp3 as
select distinct Clinic,_Clinic from temp2;
quit;

data want2;
do until(last.Clinic);
 set temp3;
 by Clinic;
 length other_Clinic $ 200;
 other_Clinic=catx(',',other_Clinic,_Clinic);
end;
keep Clinic other_Clinic;
run;

data final_want;
 merge want1 want2;
 by Clinic;
run;

proc print data=final_want noobs;
run;

Ksharp · Posted 09-28-2024 02:47 AM

Just add DISTINCT keyword before SQL,and use this dataset to calculate these readmissions and cost.

create table temp2 as
select distinct a.Clinic,a.Client,b.Clinic as _Clinic,b.Cost as Cost
 from temp as a,have as b
  where a.Client=b.Client and a.Clinic ne _Clinic and a.Span_Begin lt b.Span_Begin  ;

Count with multiple criteria

Re: Count with multiple criteria

Re: Count with multiple criteria

Re: Count with multiple criteria

Re: Count with multiple criteria

Re: Count with multiple criteria

Re: Count with multiple criteria

Count with multiple criteria

Re: Count with multiple criteria

Re: Count with multiple criteria

Re: Count with multiple criteria

Re: Count with multiple criteria

Re: Count with multiple criteria

Re: Count with multiple criteria

Registration is open

SAS Training: Just a Click Away