Each person in the dataset has an index date that is independent of chronic conditions. Each person is enrolled from 1-year prior to their index date up to 10-years after their index date. I am tracking evidence of chronic conditions and placing the disease indicator variables (0/1 for no/yes) in quarters based on the date of evidence of that chronic condition (using claims data). So, each person has 4 baseline quarters and up to 40 quarters, but this study will allow for drop out, so the post-index eligible quarters will range from 1-40. Everyone has at least 1 year baseline, so everyone has the 4 quarters prior to the index date.
Once the chronic condition is flagged, I would like to carry forward the "1", from whatever quarter that chronic condition was first identified, through the end of each person's follow-up quarters. E.g., condition was identified in quarter 2, then flag that condition as being present through the end of their follow-up, for example, quarter 17.
To do this, I think I first need to expand the quarterly indicator variable consecutively, which is NOT a date variable; it is an integer/numeric variable (e.g., -4 is the 4th quarter prior to the index date, 1 is the first quarter after the index date, up to 40 quarters). Currently, I only have quarters where there was a hospital encounter and thus a claim (1) or no claim (0) for each chronic condition. I think, second, I need to carry forward the "1" (indicating yes to evidence of that condition) from the initial quarter to all remaining quarters, contingent on each person's enrollment. Below is an example dataset that I have, followed but what I would like. For simplicity, I am including 1 condition. I figured I will have to do this separately for each condition then merge the files.
HAVE
ID | Last_enrollment_quarter | Quarter_evidence_of_condition | Chronic_condition |
1 | 3 | -3 | 0 |
1 | 3 | -2 | 1 |
1 | 3 | 1 | 0 |
2 | 40 | -4 | 0 |
2 | 40 | -2 | 0 |
2 | 40 | -1 | 1 |
2 | 40 | 1 | 1 |
2 | 40 | 4 | 1 |
2 | 40 | 17 | 0 |
2 | 40 | 25 | 0 |
3 | 24 | 12 | 0 |
3 | 24 | 13 | 0 |
3 | 24 | 15 | 0 |
3 | 24 | 16 | 0 |
3 | 24 | 22 | 0 |
4 | 2 | -2 | 1 |
5 | 12 | -1 | 1 |
5 | 12 | 1 | 0 |
5 | 12 | 4 | 0 |
WANT
ID | Last_enrollment_quarter | Quarter_evidence_of_condition | Chronic_condition |
1 | 3 | -4 | 0 |
1 | 3 | -3 | 0 |
1 | 3 | -2 | 1 |
1 | 3 | -1 | 1 |
1 | 3 | 1 | 1 |
1 | 3 | 2 | 1 |
1 | 3 | 3 | 1 |
2 | 40 | -4 | 0 |
2 | 40 | -3 | 0 |
2 | 40 | -2 | 0 |
2 | 40 | -1 | 1 |
2 | 40 | 1 | 1 |
2 | 40 | 2 | 1 |
2 | 40 | 3 | 1 |
2 | 40 | 4 | 1 |
2 | 40 | 5 | 1 |
2 | 40 | 6 | 1 |
2 | 40 | 7 | 1 |
2 | 40 | 8 | 1 |
2 | … and so on to… | … and so on to… | 1 |
2 | 40 | 40 | 1 |
3 | 24 | -4 | 0 |
3 | 24 | -3 | 0 |
3 | 24 | -2 | 0 |
3 | 24 | -1 | 0 |
3 | 24 | 1 | 0 |
3 | 24 | 2 | 0 |
3 | 24 | 3 | 0 |
3 | 24 | 4 | 0 |
3 | 24 | 5 | 0 |
3 | … and so on to… | … and so on to… | 0 |
3 | 24 | 24 | 0 |
4 | 2 | -4 | 0 |
4 | 2 | -3 | 0 |
4 | 2 | -2 | 1 |
4 | 2 | -1 | 1 |
4 | 2 | 1 | 1 |
4 | 2 | 2 | 1 |
… | … | … |
You can read through your data in by groups, finding the earliest quarter of the the chronic condition for each ID.
Then when you've read the last record of the BY-group, you have all the information you need to make the data you want.
Using Tom's data:
data have;
input ID Lastqtr Qtr Condition;
cards;
1 3 -3 0
1 3 -2 1
1 3 1 0
2 40 -4 0
2 40 -2 0
2 40 -1 1
2 40 1 1
2 40 4 1
2 40 17 0
2 40 25 0
3 24 12 0
3 24 13 0
3 24 15 0
3 24 16 0
3 24 22 0
4 2 -2 1
5 12 -1 1
5 12 1 0
5 12 4 0
;
data want ;
set have ;
by id ;
retain _firstCondition ;
if first.id then call missing(_firstCondition) ;
if Condition=1 and missing(_firstCondition) then _FirstCondition=Qtr ;
if last.id ;
Condition=0 ;
do Qtr= -4 to -1, 1 to Lastqtr ; *Your WANT data shows no Qtr 0 ;
if Qtr>=_FirstCondition>.Z then Condition=1 ;
output ;
end ;
drop _: ;
run ;
If you have multiple conditions stored as separate variables, this could be extended with arrays.
Some questions:
How do we know what the end "quarter" should be?
For instance in your description you include: "through the end of each person's follow-up quarters. E.g., condition was identified in quarter 2, then flag that condition as being present through the end of their follow-up, for example, quarter 17." So how would be know the end would be "quarter 17"?
Your example "have" shows the chronic condition changing from 1 to 0 for Id=1 between quarter_evidence -2 and 1. Are we supposed to assume any such 0 are errors and overwrite as your example or is something else going on with that indicator.
Hi Ballardw. Thank you for the reply!
"How do we know what the end "quarter" should be?"
This is the number in the column, "Last_enrollment_quarter". Apologies for the confusion- my made up example in the text (e.g., end quarter of 17) was not related to the made up data in the HAVE table. The end quarter would be 3 for ID=1, 40 for ID=2, 24 for ID=3, etc. Does that help?
"Your example "have" shows the chronic condition changing from 1 to 0 for Id=1 between quarter_evidence -2 and 1. Are we supposed to assume any such 0 are errors and overwrite as your example or is something else going on with that indicator."
The simplified conceptual background to this programming is that once the chronic condition has been flagged, it never goes away (e.g., cured). The 0's are not necessarily errors, but it does NOT reflect a reversal of the chronic condition. So, the answer to your question here is "yes", I do want to overwrite all 0's and missing data with 1 after the first quarter with "1" in it, until their "Last_enrollment_quarter".
Dan
Should be simple enough, basically just a DO loop with an OUTPUT statement.
First let's convert your LISTING into an actual dataset so we have something program with. (I will also rename the variables so they are less difficult to type.)
data have;
input ID Lastqtr Qtr Condition;
cards;
1 3 -3 0
1 3 -2 1
1 3 1 0
2 40 -4 0
2 40 -2 0
2 40 -1 1
2 40 1 1
2 40 4 1
2 40 17 0
2 40 25 0
3 24 12 0
3 24 13 0
3 24 15 0
3 24 16 0
3 24 22 0
4 2 -2 1
5 12 -1 1
5 12 1 0
5 12 4 0
;
Notice how sharing the data this way makes it not only clearer what the dataset name is and the variable types, but it also is easier to read than the spread out listing.
So we just want to find the first observation where CONDITION is TRUE and then expand from there.
data filler;
set have;
where condition;
by id;
if first.id;
do qtr = qtr to lastqtr;
output;
end;
run;
To get the earlier observations you can just merge the two.
data want;
merge have filler;
by id qtr;
run;
Hi Tom, thank you for the reply!
Using the data and format you presented in your post (thank you for doing that!), and then using your code, this does not fully work yet. In the WANT dataset, for ID=2, the order of Qtr is -4, -2, and so on, missing -3. Could that be something to do with the negative value?
Also, I would like to add a need, as I didn't realize this until now. Is it possible to get the Qtr starting from -4 for everyone? Everyone has a full year of enrollment prior to the index date, so everyone should start from -4. E.g., ID=1 first value for Qtr is -2, but I would like to get -4 and -3 along with 0s for the Condition variable.
Lastly, I noticed the code works for IDs that had 1 for Condition at some point, for not for the ID that had 0s (ID=3). Is there a way to code 0s for all quarters eligible, from -4 to lastqtr?
So the main goal is to expand the number of observations.
That might be done simplest by starting from the first observation.
data all_qtr;
set have;
by id;
if first.id;
do qtr = qtr to lastqtr;
output;
end;
keep id lastqtr qtr ;
run;
To fix the CONDITION value you will want to make a NEW variable so its value can be retained. It does no good to retain a variable coming from a dataset for two reasons. First they are already being retained. But more importantly the retained value is replaced when the next observation is read. You could write the retained value back to the original variable and drop the new one.
data want;
merge have all_qtr;
by id qtr;
retain ever;
if first.id then ever=condition;
ever = ever or condition;
condition = ever;
drop ever;
run;
Or drop the original one and rename the new one.
You can read through your data in by groups, finding the earliest quarter of the the chronic condition for each ID.
Then when you've read the last record of the BY-group, you have all the information you need to make the data you want.
Using Tom's data:
data have;
input ID Lastqtr Qtr Condition;
cards;
1 3 -3 0
1 3 -2 1
1 3 1 0
2 40 -4 0
2 40 -2 0
2 40 -1 1
2 40 1 1
2 40 4 1
2 40 17 0
2 40 25 0
3 24 12 0
3 24 13 0
3 24 15 0
3 24 16 0
3 24 22 0
4 2 -2 1
5 12 -1 1
5 12 1 0
5 12 4 0
;
data want ;
set have ;
by id ;
retain _firstCondition ;
if first.id then call missing(_firstCondition) ;
if Condition=1 and missing(_firstCondition) then _FirstCondition=Qtr ;
if last.id ;
Condition=0 ;
do Qtr= -4 to -1, 1 to Lastqtr ; *Your WANT data shows no Qtr 0 ;
if Qtr>=_FirstCondition>.Z then Condition=1 ;
output ;
end ;
drop _: ;
run ;
If you have multiple conditions stored as separate variables, this could be extended with arrays.
Brilliant- thank you!
data want;
set have;
by id;
retain _cond;
if first.id then _cond = chronic_condition;
if chronic_condition then _cond = 1;
chronic_condition = _cond;
output;
if last.id then do quarter_evidence_of_condition = quarter_evidence_of_condition + 1 to last_enrollment_quarter;
output;
end;
drop _cond;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.