Hi,
The goal is to create a new variable that captures the mean number of unique days of opioid use (all_opioid_sum) by hospital for the 365 days preceding each patient's hospital admission.
The variable all_opioid_sum already tabulates the number of unique days that a patient was on opioids. For instance, if a patient was admitted 1/1/2009 I want to create a new variable for the average number of days that patients at that particular hospital received from the prior year (1/1/2008 to 12/31/2008).
Not sure what the best approach for performing this operation sql, proc expand, etc. The complete dataset contains 180K observations, across 42 hospitals.
data Have:
informat admit_date MMDDYY10.;
input Hospital_Number$ Admit_Date all_opioid_Sum MRN$;
format admit_date MMDDYY10.;
datalines;
Hospital_Number Admit_Date all_opioid_Sum MRN
1001 7/15/2008 80 8962978
1001 11/20/2008 6 28686764
1001 11/28/2008 8 1309558
1001 12/23/2008 22 31605284
1001 12/29/2008 3 353898
1001 12/30/2008 4 13648844
1001 2/1/2009 2 9719056
1001 2/17/2009 1 15761449
1001 2/20/2009 4 17761002
1001 2/21/2009 3 31147054
1001 3/13/2009 5 25914846
1001 3/20/2009 13 3557915
1001 3/23/2009 1 13953572
1001 3/24/2009 3 1310975
1001 3/31/2009 5 5924690
1001 4/1/2009 15 4135498
1001 4/30/2009 4 3179381
1001 5/3/2009 3 13644661
1001 5/8/2009 1 24288280
1001 5/19/2009 4 5022238
1001 6/14/2009 4 7286813
1001 6/27/2009 3 15265696
1001 7/9/2009 24 12747530
1001 7/16/2009 4 10413193
1001 7/17/2009 19 17500057
1001 7/20/2009 27 1383178
1001 7/21/2009 5 5014019
1001 8/3/2009 10 4971055
1001 8/6/2009 3 29479789
1001 8/18/2009 3 4512650
1001 8/29/2009 15 8777446
1001 9/8/2009 5 22316882
1001 9/10/2009 4 4073376
1001 9/13/2009 29 8523048
1001 9/23/2009 4 4412511
1001 9/25/2009 6 21713280
1001 10/9/2009 3 26379026
1001 10/19/2009 52 7694071
1001 10/21/2009 8 18468515
1001 11/16/2009 6 26551281
;
run;
Want:
Hospital_Number Admit_Date all_opioid_Sum MRN Move_avg365
1001 7/15/2008 80 8962978 80.0
1001 11/20/2008 6 28686764 43.0
1001 11/28/2008 8 1309558 31.3
1001 12/23/2008 22 31605284 29.0
1001 12/29/2008 3 353898 23.8
1001 12/30/2008 4 13648844 20.5
1001 2/1/2009 2 9719056 17.9
1001 2/17/2009 1 15761449 15.8
1001 2/20/2009 4 17761002 14.4
1001 2/21/2009 3 31147054 13.3
1001 3/13/2009 5 25914846 12.5
1001 3/20/2009 13 3557915 12.6
1001 3/23/2009 1 13953572 11.7
1001 3/24/2009 3 1310975 11.1
1001 3/31/2009 5 5924690 10.7
1001 4/1/2009 15 4135498 10.9
1001 4/30/2009 4 3179381 10.5
1001 5/3/2009 3 13644661 10.1
1001 5/8/2009 1 24288280 9.6
1001 5/19/2009 4 5022238 9.4
1001 6/14/2009 4 7286813 9.1
1001 6/27/2009 3 15265696 8.8
1001 7/9/2009 24 12747530 9.5
1001 7/16/2009 4 10413193 9.3
1001 7/17/2009 19 17500057 9.6
1001 7/20/2009 27 1383178 10.3
1001 7/21/2009 5 5014019 10.1
1001 8/3/2009 10 4971055 10.1
1001 8/6/2009 3 29479789 9.9
1001 8/18/2009 3 4512650 9.6
1001 8/29/2009 15 8777446 9.8
1001 9/8/2009 5 22316882 9.7
1001 9/10/2009 4 4073376 9.5
1001 9/13/2009 29 8523048 10.1
1001 9/23/2009 4 4412511 9.9
1001 9/25/2009 6 21713280 9.8
1001 10/9/2009 3 26379026 9.6
1001 10/19/2009 52 7694071 10.7
1001 10/21/2009 8 18468515 10.6
1001 11/16/2009 6 26551281 8.7
Attached is is a sample (N=30,000) of my data.
Much appreciated.
Assuming
then you can build a history array for each hospital, indexed by admit_date. This will allow you to add new admissions (i.e. add all_opioid_sum to a numerator and add 1 to a denominator), and correspondingly subtract stale admissions as you step through a given hospital:
%let data_beg=01jan2007;
%let data_end=31dec2009;
data want (drop=_:);
array history {%sysfunc(intnx(year,%sysevalf("&data_beg"d),-1,sameday)):%sysevalf("&data_end"d)} _temporary_;
set have;
by hospital_number admit_date;
if first.hospital_number then call missing(of history{*],_numerator,_denominator);
history{admit_date} = all_opioid_sum; /* Restored to uploaded code*/
_numerator + all_opioid_sum;
_denominator + 1;
_lag_admit_date=lag(admit_date);
if first.hospital_number=0 then _start_stale_period = intnx('year',_lag_admit_date,-1,'sameday');
_end_stale_period = intnx('year',admit_date,-1,'sameday');
if first.hospital_number=0 then do _date=_start_stale_period to _end_stale_period;
if history{_date}^=. then do;
_numerator = _numerator - history{_date};
_denominator = _denominator - 1;
history{_date}=.;
end;
end;
move_avg_1year=_numerator/_denominator;
run;
Set the macro variables DATA_BEG and DATA_END to the date range of your study. This will allow the HISTORY array to be indexed by date, with a lower bound exactly 1 year prior to DATA_BEG.
For each incoming observation, the cleaning up of stale date covers all dates from 1 year prior to the preceding admission through 1 year prior to the current admission. Note this is NOT always 365 days, since the INTNX functions honors leap years.
BTW, it would not be too difficult to accept multiple admissions on a single date. But check the above first to see if it meets your needs.
You might want to adjust the calculation of _start_stale_period and _end_stale_period depending on whether you want to include exactly 1 year prior to the current (and exactly 1 year prior to the preceding) admission. You can check the stale_period range by eliminating the "drop=" parameter, and formatting them as date formats.
A line of code (history{admit_date}=all_opioid_sum;), deleted from the initial upload, has been re-inserted.
Editted note: Below, I have revised the code to accommodate multiple admissions on a given admit_date. This program is untested against sample date with duplicated admission dates:
data want_with_dupes (drop=_:);
array history {%sysfunc(intnx(year,%sysevalf("&data_beg"d),-1,sameday)):%sysevalf("&data_end"d)} _temporary_;
array n_mrn {%sysfunc(intnx(year,%sysevalf("&data_beg"d),-1,sameday)):%sysevalf("&data_end"d)} _temporary_;
set have;
by hospital_number admit_date;
if first.hospital_number then call missing(of history{*],of n_mrn{*},_numerator,_denominator,_n_unique_dates);
history{admit_date} + all_opioid_sum;
n_mrn{admit_date} + 1;
_numerator + all_opioid_sum;
_denominator + 1;
if last.admit_date ;
_n_unique_dates+1;
_lag_admit_date=lag(admit_date);
if _n_unique_dates>1 then _start_stale_period = intnx('year',_lag_admit_date,-1,'sameday');
_end_stale_period = intnx('year',admit_date,-1,'sameday');
if _n_unique_dates>1 then do _date=_start_stale_period to _end_stale_period;
if history{_date}^=. then do;
_numerator = _numerator - history{_date};
_denominator = _denominator - n_mrn{_date};
history{_date}=.;
n_mrn{_date}=.;
end;
end;
move_avg_1year=_numerator/_denominator;
/* Now re-read all obs for this date and output with the current moving average*/
do until (last.admit_date);
set have;
by hospital_number admit_date;
output;
end;
run;
The revised program has an additional array (N_MRN), which provides a "history" of the number of obs on each admission date. Also, note the program now tracks the number of unique admission dates, and (after calculating the moving average) re-reads and outputs all obs for the current date.
This appears to be a continuation of your question here: https://communities.sas.com/t5/SAS-Programming/Moving-average-by-site-and-date/m-p/751533#M236595
The error message is quite clear, you can't have more than 255 levels when you use a multi-level format.
So, you need a different solution. Perhaps something like this, where average replaces regression: https://pages.stern.nyu.edu/~adesouza/sasfinphd/index/node25.html
Thanks I appreciate your guidance, I'm so unfamiliar with multilabel formats that I was hoping there was a simple work around for the error message. I'll explore your link, perhaps that's the solution I need.
Lets parse out your description and some places that may need clarification.
" I want to capture in a new variable for each patient the mean number of opioid's prescribed "
By "new variable for each patient" you do mean one variable. Or is each patient getting their own named variable?
You aren't very clear on what you are taking a mean of here. If a patient was prescribed two opioids but received 500 doses what would the mean be? Mean needs to know what you are counting: doses, prescriptions, different named drugs. Then what is divisor to get the mean? Mean doses per patient? Mean prescribed number of opioids (not prescriptions, names of opioids) per hospital? Mean per year per hospital? Patients? Here use of VARIABLE names and description of what that variable contains may be important. Do not use verbiage that does not very clearly tie to variables.
This is a case of you know your data (hopefully) and where you are going but not describing it clearly enough that those of us without all of the experience understand the short cuts your though processes are taking.
"For instance, if a patient was admitted 1/1/2009 I want to create a new variable for the average number of pills that any patients at that particular hospital received from the prior year (1/1/2008 to 12/31/2008)." Is this saying that if Patient XYZ was adimitted but not given any "pills" (not consistent with use of Opioids in the first bit) that we look at ALL the patients in that hospital for "pills"?
Provide some dummy data with fake hospital, patient data as long as it has a enough detail to calculate what you want.
Then using that data walk us through the actual calculation of counting what it is you want.
Note: part of the reason this doesn't make much sense is your multilabel format would be applying a value from every admission to many other admissions. Which makes no sense to me.
Use a small subset of your Analytic data set, maybe 50 observations (data subset; set analytic (obs=50); run;) and use that set to make the MLF format and see if the results make any sense.
Apologies, I've edited original post and reframed my question. Hopefully it's more detailed enough to help facilitate pointing me in the right direction.
Assuming
then you can build a history array for each hospital, indexed by admit_date. This will allow you to add new admissions (i.e. add all_opioid_sum to a numerator and add 1 to a denominator), and correspondingly subtract stale admissions as you step through a given hospital:
%let data_beg=01jan2007;
%let data_end=31dec2009;
data want (drop=_:);
array history {%sysfunc(intnx(year,%sysevalf("&data_beg"d),-1,sameday)):%sysevalf("&data_end"d)} _temporary_;
set have;
by hospital_number admit_date;
if first.hospital_number then call missing(of history{*],_numerator,_denominator);
history{admit_date} = all_opioid_sum; /* Restored to uploaded code*/
_numerator + all_opioid_sum;
_denominator + 1;
_lag_admit_date=lag(admit_date);
if first.hospital_number=0 then _start_stale_period = intnx('year',_lag_admit_date,-1,'sameday');
_end_stale_period = intnx('year',admit_date,-1,'sameday');
if first.hospital_number=0 then do _date=_start_stale_period to _end_stale_period;
if history{_date}^=. then do;
_numerator = _numerator - history{_date};
_denominator = _denominator - 1;
history{_date}=.;
end;
end;
move_avg_1year=_numerator/_denominator;
run;
Set the macro variables DATA_BEG and DATA_END to the date range of your study. This will allow the HISTORY array to be indexed by date, with a lower bound exactly 1 year prior to DATA_BEG.
For each incoming observation, the cleaning up of stale date covers all dates from 1 year prior to the preceding admission through 1 year prior to the current admission. Note this is NOT always 365 days, since the INTNX functions honors leap years.
BTW, it would not be too difficult to accept multiple admissions on a single date. But check the above first to see if it meets your needs.
You might want to adjust the calculation of _start_stale_period and _end_stale_period depending on whether you want to include exactly 1 year prior to the current (and exactly 1 year prior to the preceding) admission. You can check the stale_period range by eliminating the "drop=" parameter, and formatting them as date formats.
A line of code (history{admit_date}=all_opioid_sum;), deleted from the initial upload, has been re-inserted.
Editted note: Below, I have revised the code to accommodate multiple admissions on a given admit_date. This program is untested against sample date with duplicated admission dates:
data want_with_dupes (drop=_:);
array history {%sysfunc(intnx(year,%sysevalf("&data_beg"d),-1,sameday)):%sysevalf("&data_end"d)} _temporary_;
array n_mrn {%sysfunc(intnx(year,%sysevalf("&data_beg"d),-1,sameday)):%sysevalf("&data_end"d)} _temporary_;
set have;
by hospital_number admit_date;
if first.hospital_number then call missing(of history{*],of n_mrn{*},_numerator,_denominator,_n_unique_dates);
history{admit_date} + all_opioid_sum;
n_mrn{admit_date} + 1;
_numerator + all_opioid_sum;
_denominator + 1;
if last.admit_date ;
_n_unique_dates+1;
_lag_admit_date=lag(admit_date);
if _n_unique_dates>1 then _start_stale_period = intnx('year',_lag_admit_date,-1,'sameday');
_end_stale_period = intnx('year',admit_date,-1,'sameday');
if _n_unique_dates>1 then do _date=_start_stale_period to _end_stale_period;
if history{_date}^=. then do;
_numerator = _numerator - history{_date};
_denominator = _denominator - n_mrn{_date};
history{_date}=.;
n_mrn{_date}=.;
end;
end;
move_avg_1year=_numerator/_denominator;
/* Now re-read all obs for this date and output with the current moving average*/
do until (last.admit_date);
set have;
by hospital_number admit_date;
output;
end;
run;
The revised program has an additional array (N_MRN), which provides a "history" of the number of obs on each admission date. Also, note the program now tracks the number of unique admission dates, and (after calculating the moving average) re-reads and outputs all obs for the current date.
Amazing, thank you! There is the potential for multiple admissions on the same date (different MRNs), it's not obvious to me at a glance how this would this impact the calculation?
@Shad wrote:
Amazing, thank you! There is the potential for multiple admissions on the same date (different MRNs), it's not obvious to me at a glance how this would this impact the calculation?
Presumably you want to have the same values of moving average for ALL obs on a given date, yes?
But the program I submitted generates an updated numerator, denominator, and moving average with every incoming observation. This means that two obs on a given date will have different numerators, denominators and moving averages. It also means that each date in the history array would have only the last all_opioid_sum for the date, not the total for that date. So removing stale dates from the numerator (and denominator) would be underweighted.
It sounds like you want to calculate the average of the variable ALL_OPIOD_SUM for the 365 day window ending at that current value of ADMIT_DATE.
So what this SQL code would produce.
proc sql;
create table want as
select distinct a.*
, mean(b.all_opioid_Sum) as Move_avg365
from have a
join have b
on a.Hospital_Number = b.Hospital_Number
and b.admit_date between a.admit_date-365 and a.admit_date
group by a.Hospital_Number,a.admit_date
;
quit;
Is the problem that the data is too large to do that self join?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.