Hi,
I'm trying to create a rolling average by a particular date and for a particular hospital. I want to capture in a new variable for each observation that has the mean number of opioid's prescribed for the year prior for that hospital. For instance, if a patient was admitted 1/1/2009 to a hospital and during their patient stay received 11 opioids I want to create a new variable for the mean number of pills that patients received at that hospital from the prior year (1/1/2008 to 12/31/2008).
I feel like proc SQL would be the best approach but unfortunately, I'm not very familiar with it and this seems outside of what proc expand is designed to do (correct me if I'm wrong).
Below is a sample of my data.
Data test;
informat admit_date MMDDYY10.;
input MRN hospital_number admit_date opioid_sum;
format admit_date MMDDYY10.;
datalines;
4445022004 2004 3/20/2005 3
4446352004 2004 3/22/2005 41
4813282033 2033 8/18/2008 11
5217492033 2033 3/1/2010 53
5786652002 2002 11/13/2005 1
5933342033 2033 10/29/2012 16
6014662003 2003 12/27/2004 43
6156632004 2004 8/7/2009 2
6902512003 2003 5/11/2007 9
7122882003 2003 1/4/2008 31
7325012006 2006 8/9/2013 44
7812692001 2001 11/25/2005 14
7918522001 2001 2/15/2006 3
8063392001 2001 7/17/2006 6
8486892028 2028 3/17/2006 6
9104802021 2021 3/11/2008 64
9275072021 2021 8/14/2008 9
9290112001 2001 9/4/2009 14
9324532021 2021 9/17/2008 7
9937491005 1005 3/21/2005 4
10431201005 1005 1/20/2006 38
10528162020 2020 7/17/2005 1
11137382001 2001 10/8/2013 1
11182631010 1010 11/9/2005 5
11254521011 1011 6/10/2009 3
11482081010 1010 8/1/2006 2
11493322020 2020 1/2/2007 25
11591912020 2020 2/24/2007 5
11603272020 2020 3/1/2007 5
11676152021 2021 8/18/2011 51
11849891011 1011 8/12/2009 16
11887782020 2020 8/20/2007 13
11988412020 2020 10/17/2007 21
12069562020 2020 11/28/2007 1
12130301012 1012 9/12/2005 2
12241461012 1012 1/16/2006 12
12438572011 2011 4/25/2005 1
12499572016 2016 5/24/2005 16
12579142011 2011 9/23/2005 1
12654792011 2011 12/16/2005 8
12713432020 2020 12/18/2008 35
12740371010 1010 7/1/2009 33
12800172011 2011 5/13/2006 15
12806022020 2020 2/10/2009 9
12901762020 2020 4/4/2009 1
13086771010 1010 3/20/2010 7
13168161012 1012 7/7/2008 10
13387511005 1005 10/17/2010 1
13887512016 2016 10/29/2007 8
13921642011 2011 5/21/2009 30
14124191005 1005 11/2/2011 27
14473312011 2011 11/12/2010 29
14643652008 2008 10/8/2005 13
14980751005 1005 1/24/2013 28
15157472027 2027 9/24/2007 2
15494881001 1001 2/18/2010 2
15729231003 1003 2/23/2013 16
16041112008 2008 12/10/2007 12
16092722011 2011 9/10/2012 51
16990662018 2018 1/22/2008 2
17054071003 1003 7/15/2014 8
18947922018 2018 4/22/2012 1
19635862018 2018 8/16/2013 11
20054472004 2004 10/13/2010 2
21148552013 2013 1/24/2007 7
21557722020 2020 1/23/2011 31
21830262009 2009 7/23/2006 93
22514892023 2023 7/8/2007 15
22791052009 2009 12/30/2007 9
22818062008 2008 3/2/2013 12
22832352023 2023 3/10/2008 6
22882201012 1012 2/18/2015 4
23210662020 2020 11/8/2012 1
23220212009 2009 8/25/2008 37
23225792020 2020 11/13/2012 2
23300982008 2008 8/27/2013 29
23335942013 2013 8/19/2011 1
23396402009 2009 11/24/2008 81
23622612020 2020 3/27/2013 2
23725172009 2009 5/11/2009 39
24423602008 2008 11/26/2014 8
24579022022 2022 3/14/2005 15
24862462020 2020 7/7/2014 6
25021732023 2023 12/4/2012 2
25090032022 2022 9/29/2005 57
25587792023 2023 2/28/2014 16
31154832027 2027 2/15/2010 18
32126542020 2020 1/28/2012 3
32191452005 2005 7/25/2006 2
33195552026 2026 3/30/2005 13
37518332027 2027 4/16/2015 7
43750682030 2030 11/9/2009 4
44007522030 2030 3/25/2010 21
47892722030 2030 8/23/2014 16
51656082029 2029 8/15/2005 21
52664542029 2029 1/2/2007 21
55401492026 2026 9/29/2009 36
55824643010 3010 5/1/2015 92
56357072029 2029 5/17/2012 2
60493232020 2020 8/28/2005 28
60998792020 2020 7/27/2006 30
61280752020 2020 1/30/2007 1
68820382012 2012 2/3/2011 30
89391292015 2015 7/21/2005 70
89406682015 2015 7/29/2005 116
205907461008 1008 2/9/2006 13
207946161008 1008 6/2/2009 1
208221081008 1008 11/1/2009 10
209104801008 1008 3/11/2011 7
253883972025 2025 5/7/2006 2
272163402025 2025 8/24/2007 1
279785992025 2025 3/6/2008 5
341858412025 2025 3/25/2012 1
353984432025 2025 1/5/2013 16
378312872025 2025 8/30/2014 5
400255322036 2036 10/18/2005 21
417658442036 2036 10/9/2011 1
419506921007 1007 9/27/2005 27
441098661007 1007 7/7/2007 20
446233861007 1007 1/2/2008 3
455716281007 1007 12/26/2008 20
461343691007 1007 8/19/2009 4
484508721007 1007 2/7/2012 14
500051301008 1008 6/18/2006 11
550530602022 2022 1/11/2008 1
550739502022 2022 6/5/2008 18
551705692022 2022 3/7/2009 19
552849522022 2022 4/24/2010 1
554176542022 2022 7/20/2011 19
555115672022 2022 5/26/2012 215
556464162022 2022 8/10/2013 15
5495125313001 3001 12/22/2006 22
5501440913001 3001 10/19/2007 55
5512842443001 3001 6/15/2009 66
5526333383001 3001 4/27/2011 77
5706842303001 3001 8/7/2014 68
9705840221010 1010 3/14/2012 16
9811279991010 1010 10/20/2010 7
30003487852015 2015 7/17/2008 43
30006776162015 2015 1/4/2010 9
30010661572015 2015 1/31/2012 235
30011080462015 2015 3/26/2012 6
30014669912015 2015 9/29/2013 3
30015746212015 2015 7/31/2014 116
99367660042017 2017 10/13/2008 2
99368273652017 2017 12/23/2009 2
99368677442017 2017 9/13/2010 31
99369541422017 2017 5/16/2012 34
99426677262017 2017 4/16/2009 9
99428108082017 2017 12/14/2014 47
;
run;
In the TEST data, some hospitals don't have any records in the previous year, I suspect that it might be that way in the full data set as well. Anyway, here is my solution using the TEST data, I leave it up to you to check to see if I have the right means.
/* Find all distinct dates in the data set */
proc sql;
create table dates as select distinct admit_date from test;
quit;
/* Create multi-label format for dates, so we can find average of previous year*/
data dates1;
set dates;
start=intnx('year',admit_date,-1,'s');
end=admit_date-1;
fmtname='datef';
hlo='M';
label=put(start,mmddyys10.);
run;
proc format cntlin=dates1;
run;
/* Compute means over previous year using Multilabel format */
proc summary data=test nway;
class hospital_number;
class admit_date/mlf;
var opioid_sum;
format admit_date datef.;
output out=means mean=opioid_mean;
run;
/* Combine means with original data */
proc sql;
create table final as select
t.*,m.opioid_mean
from test as t left join means as m
on t.hospital_number=m.hospital_number
and t.admit_date=intnx('year',input(m.admit_date,mmddyy10.),1,'s')
order by t.hospital_number,t.admit_date;
quit;
Your description sounds like just a plain old average by year and hospital; it doesn't sound to me like a moving average
proc summary data=test nway;
class hospital_number admit_date;
var opioid_sum;
format admit_date year.;
output out=means mean=;
run;
Now, if I have misunderstood, and you really want a moving average, then please explain further.
Okay, got it. Thanks for clearing up my confusion.
While I think this could be done in PROC SQL, if the data set is really large, this could take a very long time. How many records in this data set? I'm thinking of using PROC SUMMARY, I don't have actual code right now, but when I get something that works, I will post it.
In the TEST data, some hospitals don't have any records in the previous year, I suspect that it might be that way in the full data set as well. Anyway, here is my solution using the TEST data, I leave it up to you to check to see if I have the right means.
/* Find all distinct dates in the data set */
proc sql;
create table dates as select distinct admit_date from test;
quit;
/* Create multi-label format for dates, so we can find average of previous year*/
data dates1;
set dates;
start=intnx('year',admit_date,-1,'s');
end=admit_date-1;
fmtname='datef';
hlo='M';
label=put(start,mmddyys10.);
run;
proc format cntlin=dates1;
run;
/* Compute means over previous year using Multilabel format */
proc summary data=test nway;
class hospital_number;
class admit_date/mlf;
var opioid_sum;
format admit_date datef.;
output out=means mean=opioid_mean;
run;
/* Combine means with original data */
proc sql;
create table final as select
t.*,m.opioid_mean
from test as t left join means as m
on t.hospital_number=m.hospital_number
and t.admit_date=intnx('year',input(m.admit_date,mmddyy10.),1,'s')
order by t.hospital_number,t.admit_date;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.