I have a problem that is ostensibly simple, but I'm really struggling with it.
In my data I have two columns that are important: END_DATE and DURATION. END_DATE is a date variable formatted as monyy7 (but does contain the day of the month as well).
The data looks like this:
AUG2015 26
AUG2015 27
AUG2015 16
AUG2015 20
AUG2015 7
AUG2015 22
AUG2015 28
AUG2015 28
AUG2015 35
AUG2015 15
AUG2015 28
AUG2015 8
AUG2015 14
With these I want to calculate a 3 month moving median. It's easy to calculate the median for the months, but I'm not sure how to do a rolling median.
From googling I found PROC EXPAND, but this requires non-duplicate variables. This is a problem as for each month we have lots of durations (as you can see from the example). I can't calculate the median for each month and then use PROC EXPAND as a median of a median won't be an accurate measure.
I think I need create a new variable where each month is assigned a group but I'm unsure how to do this.
ie. AUG2015 =0, SEP2015 =1, OCT2015=2 etc
Once I have that I assume that I could calculate the rolling median using some sort of lag function?
I'm new to SAS, so apologies if this question is very trivial!
Sorry, the data for each month is quite extensive so I just posted the first few lines.
An input (very condensed) would look something like this:
AUG2015 26
AUG2015 27
SEP2015 16
SEP2015 20
SEP2015 7
OCT2015 22
NOV2015 28
NOV2015 28
NOV2015 35
NOV2015 15
DEC2015 28
DEC2015 8
DEC2015 14
The idea is to obtain something like this:
AUG2015 ..
SEP2015 ..
OCT2015 21
NOV2015 21
DEC2015 25
As I mentioned in my original comment I think what I need to do is assign a group to each month, something along the lines of:
data outfile_test3;
set outfile_test2;
by END_DATE;
retain count 0;
if first.END_DATE then count=count+1;
This begins to count, but it counts the day as a new date (so it adds a count for the 2nd Aug 2015 etc). I can make a new variable using either MONTH or YEAR functions and it seems to be working correctly, but I'm not sure exactly how to combine the two.
I'm also not sure exactly how to calculate the 'window' of medians. I think the code I'm after is using a lag function but I don't really know how to write it.
Sorry for how basic this work through is, but I am very new SAS!
PROC SUMMARY to get the median by each month in a data set.
Then PROC EXPAND to get the moving median of the last N months. An example of a moving average of 3 months is shown here: https://documentation.sas.com/?docsetId=etsug&docsetVersion=15.1&docsetTarget=etsug_expand_examples0..., you want a moving median which is MOVMED instaed of MOVAVE.
Convert your months to periods as you indicated, follow the instructions here for that:
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/.
Then you can use this macro here with a window of 3. There are probably more optimal ways than this but this will work for your example.It assumes your variable name is X. Replace that as needed.
proc sql noprint;
select min(period) into :min_period TRIMMED
from have;
select max(period) into :max_period TRIMMED
from have;
quit;
%put &min_period;
%put &max_period;
options nomprint nosymbolgen;
%macro moving_summary(datain=, window=, dataout=);
*remove previous summary table;
proc sql;
drop table &dataout;
quit;
%do i=&min_period. %to %eval(&max_period.-&window.);
proc means data=&datain noprint;
var x;
where period between &i and %eval(&i. + &window. - 1);
output out=_temp n=n_x mean=mean_x std=std_x stderr=stderr_x median = median_x;
run;
data _temp;
set _temp;
start = &i;
end = &i + &window - 1;
run;
proc append base=&dataout data=_temp;
run;
proc sql;
drop table _temp;
quit;
%end;
%mend;
%moving_summary(datain=have, window=6, dataout=summary_stats);
proc means data=have;
where period between 1 and 6;
var x;
run;
@Reeza 's suggestion made me recall a technique I have sometimes used for rolling windows - namely to output a copy of each data record once for each window in which it belongs (and assigning a consistent ending date for each window).
Then run a proc summary or proc means using the window end date as a class var:
data have;
input date :monyy7. x;
format date date9.;
datalines;
AUG2015 26
AUG2015 27
SEP2015 16
SEP2015 20
SEP2015 7
OCT2015 22
NOV2015 28
NOV2015 28
NOV2015 35
NOV2015 15
DEC2015 28
DEC2015 8
DEC2015 14
;
data vneed / view=vneed;
set have nobs=nrecs;
if _n_=1 then do;
/*Generate the window end date needed for the 1st 3-month window*/
set have (keep=date rename=(date=first_date));
first_date=intnx('month',first_date,2,'end');
/*Generate the window end date needed for the last 3-month window */
set have (keep=date rename=(date=last_date)) point=nrecs;
last_date=intnx('month',last_date,0,'end');
end;
/* Write a record for each window it belongs to */
do m=0 to 2;
window_enddate=intnx('month',date,m,'end');
if first_date<=window_enddate<=last_date then output;
end;
format window_enddate date9.;
run;
proc means data=vneed median ;
var x;
class window_enddate;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.