I have a dataset of hospital admissions of members. I want to figure out (1) how many clients had 3 or more readmissions to the same hospital within the first year of their first admission and (2) how many clients had 3 or more readmissions to any of the hospitals within the same year. For both questions above I want to show by hospital the stats. My data set has over 50,000 records. I cant seem to figure out how I can use the first. /last. (I think that would be the way to go) to run both. Below I have a code with how the date would look. (NOTE I added a note to indicate which the first admission would be. You will note that ID 123 for Hospital B the admission date 1/2/18 would NOT count as the next admission is not until 5/6/19 more than a year apart. In this case when I want to count 3 or more admissions within a 12 month time period for member 123 for hosp B admissions on 5/6/19,6/7/19 and 7/7/19 would make member 123 eligible for the 3 or more criteria but the 1/2/18 would not.
In this example Hosp A, B and C each would get a 1 for question 1, since member 123 for Hosp A and B and member 678 for Hosp C met the "3 or more admissions with in the same hospital". For question 2 only Hosp A would get a 1 since member 123 who initially was from Hospital A was then also seen at Hospital B.
I believe I would need to first order the data set by Member and date and then hospital to make sure that I get the most recent visit of the member on top, right?
data have; length Hosp $1 ID $3. ; input Hosp $ ID $ Date mmddyy10.; format Date mmddyy10.; datalines; A 123 01/01/19 <-- This would be the first visit for the member A 123 01/05/19 A 123 02/03/19 A 123 02/10/19 A 123 05/12/19 A 123 12/05/19 B 456 04/03/18 <-- This would be the first visit for the member B 456 05/05/18 B 123 01/02/18 <-- This would Not be the first visit since the next visit is more than a year away B 123 05/06/19 <-- This would be the first visit for the member B 123 06/07/19 B 123 07/07/19 C 678 01/01/18 <-- This would be the first visit for the member C 678 04/06/18 C 678 06/08/18 C 678 07/09/18 D 123 01/01/18 <-- This would be the first visit for the member D 123 05/03/18 run;
You can create a 'synthetic' group based on visit date differences, and perform your counting based on the groups that achieve 4 or more members. (1 for first admit, plus at least presumed 3 readmits)
Because you are processing date ordered data in the two different groupings
you will want to process each one separately.
Data
data have;
length Hosp $1 ID $3 ;
input Hosp ID Date mmddyy8.;
format Date mmddyy10.;
datalines;
A 999 01/01/19
A 123 01/01/19
A 123 01/05/19
A 123 02/03/19
A 123 02/10/19
A 123 05/12/19
A 123 12/05/19
B 999 01/15/19
B 456 04/03/18
B 456 05/05/18
B 123 01/02/18
B 123 05/06/19
B 123 06/07/19
B 123 07/07/19
C 999 01/21/19
C 678 01/01/18
C 678 04/06/18
C 678 06/08/18
C 678 07/09/18
D 123 01/01/18
D 123 05/03/18
E 999 02/15/19
E 123 01/03/18
E 123 05/07/19
run;
Synthetic group computation. From a first date check if the date is within 1 year continuous. If so, the date is in the group and gets a new within group sequence number. The group is 'reset' at the start of a by group or when the gap between visits is at least a year.
BY HOSP and ID
proc sort data=have;
by hosp id date;
data want_by_hospital_readmits;
set have;
by hosp id;
retain date1 0;
if first.id OR intck('year', date1, date, 'C') > 0 then do;
group+1;
seq=1;
date1 = date;
end;
else
seq+1;
format date1 mmddyy10.;
run;
proc sql;
create table howmany1 as
select count(distinct id) as howmany from want_by_hospital_readmits
where seq=4;
create table whichmany1 as
select distinct(id) from want_by_hospital_readmits
where seq=4;
BY only ID
proc sort data=have;
by id date;
run;
data want_by_readmits;
set have;
by id;
retain date1 0;
if first.id OR intck('year', date1, date, 'C') > 0 then do;
group+1;
seq=1;
date1 = date;
end;
else
seq+1;
format date1 mmddyy10.;
run;
proc sql;
create table howmany2 as
select count(distinct id) as howmany from want_by_readmits
where seq=4;
create table whichmany2 as
select distinct(id) from want_by_readmits
where seq=4;
There can be some confounding sequences of dates where a legit first date might be embedded in a group that 'failed' to reach 3.
Something like this should get your initial admission flagged and then you'll need to write logic for counting admissions after the initial one.
data have;
input Hosp :$1. ID :$3. Date mmddyy10.;
format Date mmddyy10.;
datalines;
A 123 01/01/19
A 123 01/05/19
A 123 02/03/19
A 123 02/10/19
A 123 05/12/19
A 123 12/05/19
B 456 04/03/18
B 456 05/05/18
B 123 01/02/18
B 123 05/06/19
B 123 06/07/19
B 123 07/07/19
C 678 01/01/18
C 678 04/06/18
C 678 06/08/18
C 678 07/09/18
D 123 01/01/18
D 123 05/03/18
;
run;
*Sort descending date so lag function accesses 'next' admission;
proc sort data=have;
by id hosp descending date;
run;
*Find date difference in years between this admission and the next one;
data have2;
set have;
by id hosp;
_ldate = lag(date);
if not first.hosp then date_diff = intck('year',date,_ldate);
/* drop _:; */
run;
*Sort ascending;
proc sort data=have2;
by id hosp date;
run;
*Initial admission is either the first id&hosp combination when date_diff is not 1;
*Or it's the next admission after date_diff=1;
data want;
set have2;
by id hosp;
_ldatediff = lag(date_diff);
initial_admission = ((first.hosp and not date_diff) or (_ldatediff));
/* drop _: date_diff; */
run;
You can create a 'synthetic' group based on visit date differences, and perform your counting based on the groups that achieve 4 or more members. (1 for first admit, plus at least presumed 3 readmits)
Because you are processing date ordered data in the two different groupings
you will want to process each one separately.
Data
data have;
length Hosp $1 ID $3 ;
input Hosp ID Date mmddyy8.;
format Date mmddyy10.;
datalines;
A 999 01/01/19
A 123 01/01/19
A 123 01/05/19
A 123 02/03/19
A 123 02/10/19
A 123 05/12/19
A 123 12/05/19
B 999 01/15/19
B 456 04/03/18
B 456 05/05/18
B 123 01/02/18
B 123 05/06/19
B 123 06/07/19
B 123 07/07/19
C 999 01/21/19
C 678 01/01/18
C 678 04/06/18
C 678 06/08/18
C 678 07/09/18
D 123 01/01/18
D 123 05/03/18
E 999 02/15/19
E 123 01/03/18
E 123 05/07/19
run;
Synthetic group computation. From a first date check if the date is within 1 year continuous. If so, the date is in the group and gets a new within group sequence number. The group is 'reset' at the start of a by group or when the gap between visits is at least a year.
BY HOSP and ID
proc sort data=have;
by hosp id date;
data want_by_hospital_readmits;
set have;
by hosp id;
retain date1 0;
if first.id OR intck('year', date1, date, 'C') > 0 then do;
group+1;
seq=1;
date1 = date;
end;
else
seq+1;
format date1 mmddyy10.;
run;
proc sql;
create table howmany1 as
select count(distinct id) as howmany from want_by_hospital_readmits
where seq=4;
create table whichmany1 as
select distinct(id) from want_by_hospital_readmits
where seq=4;
BY only ID
proc sort data=have;
by id date;
run;
data want_by_readmits;
set have;
by id;
retain date1 0;
if first.id OR intck('year', date1, date, 'C') > 0 then do;
group+1;
seq=1;
date1 = date;
end;
else
seq+1;
format date1 mmddyy10.;
run;
proc sql;
create table howmany2 as
select count(distinct id) as howmany from want_by_readmits
where seq=4;
create table whichmany2 as
select distinct(id) from want_by_readmits
where seq=4;
There can be some confounding sequences of dates where a legit first date might be embedded in a group that 'failed' to reach 3.
This is fantastic!
Two questions:
1. I don't understand what the "intck('year', date, date1, 'C') > 0 then do;" is doing can you please explain. I think I understand that intck 'year' is looking at the 'Year' but don't understand what the "C")>0 is doing.
AND
2. What I need to do is that if the following admits are within 12 months of the initial date then to count the admits. But if an admit is more than 12 months of the initial date then to not count that admit with the first group but to start a new group. E.g. is a persons first admission was on Jan 1, 2019 then any visits within 12 months of Jan 1, 2019 so until Jan 1, 2020 the visits would be counted, but if a visit occurred on Feb 2, 2020, or even Jan 2, 2020 then the seq/group (still trying to understand which one it is, I think Seq) would start again at 1. How/where would we incorporate that condition?
Thanks!
The documentation is your friend and will explain in far greater detail.
#1.
This expression
intck('year', date, date1, 'C') > 0
checks if the current date is more than one year away from the first date (date1) of a 'presumed' group. I say presumed because the potential confounding that can occur when examining date ranges and interval gaps. Recall that date1 is a retained variable and only assigned from the block that 'discovers' the start of the group.
The C argument means continuous, so the continuous year interval from say 13AUG15 to 12AUG16 will be 0 because it is < 1 year forward for 13AUG15.
47 data _null_; 48 ck_discrete = intck('year', '13aug15'd, '12aug16'd); 49 ck_continuous = intck('year', '13aug15'd, '12aug16'd, 'C'); 50 51 put ck_discrete=; 52 put ck_continuous=; 53 run; ck_discrete=1 ck_continuous=0
#2.
What happens in the start condition block ?
if first.id OR intck('year', date, date1, 'C') > 0 then do;
group+1;
seq=1;
date1 = date;
end;
What happens if NOT in the start condition block ?
else seq+1;
The sequence number for 'row in group' is incremented because the date is < 1 year away from the start date of the group.
So the `group` value is a discrete value that segregates rows in to different groups, and that knowledge can be later leveraged in characterizing the groups; such as, if a group contains a seq=4 you will know it contains, at least, what would be considered 3 readmits.
The variable name `group` might be too generic and thus confusing, so you could use a variable name more descriptive (but probably more annoying to coder readers) such as 'computed_readmit_group'.
As you code more, you might find your coding habits going from 'overly descriptive' variable names to 'effectively descriptive' variable names.
Thank you @RichardDeVen for a detailed explation, this helps.
One more question. I thought I got it, but I dont think I do.
So the condition of wanting to count from the initial visit I dont think is working.
For example, client 123 for Hosp B (row 9) we see that the initial visit was on 1/2/2018 and the visit on row 9 is on 5/6/2019, more than a year apart, so shouldnt the group and seq reset? Also shouldnt date1 then reset on row 9 to reflect 5/6/2019 and restart the seq from 1? That is what I am looking for.
Same for Row 22 Hosp E same client, 123.
Maybe the OR statement in "if first.id OR intck('year', date, date1, 'C') > 0 then do;" is not doing what we want it to do? should it be an and? or should there be two if statements?
Thank you soo much for helping and guiding me on this, I am learning alot! I appreciate your help, you all are awesome!
Look carefully at the initial answer
Because you are processing date ordered data in the two different groupings
- BY hosp id
- BY id
you will want to process each one separately.
Perhaps not obvious, it means you have to SORT
The answer then shows two examples of codes:
Each processing requires a different sort order because when you look at the data ordered by ID DATE there is no guarantee the hospitals are 'contiguous' aka a group
The first processing finds only 2 IDs with at least 3 readmits to the same hospital.
The second processing finds 3 IDs with at least 3 readmits, regardless of hosptial.
The test
if first.id OR intck('year', date, date1, 'C') > 0 then do;
works in both cases because of the way the BY statements are specified and how BY processing works.
For a statement such as
BY var1 var2 var3 var4;
first.var4 will be 1 when
'changes' means the implicit SAS determined case of comparing a by variable value in the current row to the value in the prior row.
Because in this problem ID is last variable in the BY grouping, a simple first.ID evaluation can tells us
- if the ID changed within HOSP (first sample code operates on data sorted BY HOSP ID), or
- if the ID changed (second sample code operates on data sorted by ID)
Thank you @RichardDeVen , I understand that we are processing two different ways so we would be sorting by two different ways.
If we were to just look at the first method of processing by hosp id date, with the condition of counting admits in a 12 month period by hosp for each id we see that there is step missing (and this is where I am puzzled as to what we need to do).
In the picture below we see that in row 9 the group should change to the next number (4) and the seq should reset to 1 because the visit for ID 123 in hosp B is more than 12 months. (same for row 23)
How can we reset the group and seq back to 1? Again, if we only address the first question at this point?
This is very helpful though and I truly appreciate your assistance.
Oops! There was an argument reversal in INTCK.
The correct code should have argument date1 before date.
if first.id OR intck('year', date1, date, 'C') > 0 then do;
With the arguments in the correct order the result will always be >= 0 because date1 is set when first.id is true.
I will correct the code in the earlier replies.
@RichardDeVen Ah that was it!
Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.