BookmarkSubscribeRSS Feed
J_Brown
Calcite | Level 5

I have a problem that is ostensibly simple, but I'm really struggling with it.

 

In my data I have two columns that are important: END_DATE and DURATION. END_DATE is a date variable formatted as monyy7 (but does contain the day of the month as well). 

The data looks like this:

AUG2015       26
AUG2015       27
AUG2015       16
AUG2015       20
AUG2015        7
AUG2015       22
AUG2015       28
AUG2015       28
AUG2015       35
AUG2015       15
AUG2015       28
AUG2015         8
AUG2015       14

With these I want to calculate a 3 month moving median. It's easy to calculate the median for the months, but I'm not sure how to do a rolling median.

From googling I found PROC EXPAND, but this requires non-duplicate variables. This is a problem as for each month we have lots of durations (as you can see from the example). I can't calculate the median for each month and then use PROC EXPAND as a median of a median won't be an accurate measure.

I think I need create a new variable where each month is assigned a group but I'm unsure how to do this.

ie. AUG2015 =0, SEP2015 =1, OCT2015=2 etc

Once I have that I assume that I could calculate the rolling median using some sort of lag function?

 

I'm new to SAS, so apologies if this question is very trivial!

6 REPLIES 6
Reeza
Super User
Can you provide a fully worked example please?
Show your input data and expected output. The data you've posted for a single month so it isn't really reflective of your problem that I can see.
J_Brown
Calcite | Level 5

Sorry, the data for each month is quite extensive so I just posted the first few lines.

 

An input (very condensed) would look something like this:

AUG2015       26
AUG2015       27
SEP2015       16
SEP2015       20
SEP2015        7
OCT2015       22
NOV2015       28
NOV2015       28
NOV2015       35
NOV2015       15
DEC2015       28
DEC2015         8
DEC2015       14

 

The idea is to obtain something like this:

AUG2015     ..

SEP2015     ..

OCT2015    21

NOV2015    21

DEC2015    25

 

As I mentioned in my original comment I think what I need to do is assign a group to each month, something along the lines of:

data outfile_test3;
set outfile_test2;
by END_DATE;
retain count 0;
if first.END_DATE then count=count+1;

 

This begins to count, but it counts the day as a new date (so it adds a count for the 2nd Aug 2015 etc). I can make a new variable using either MONTH or YEAR functions and it seems to be working correctly, but I'm not sure exactly how to combine the two.

 

I'm also not sure exactly how to calculate the 'window' of medians. I think the code I'm after is using a lag function but I don't really know how to write it.

 

Sorry for how basic this work through is, but I am very new SAS!

PaigeMiller
Diamond | Level 26

PROC SUMMARY to get the median by each month in a data set.

 

Then PROC EXPAND to get the moving median of the last N months. An example of a moving average of 3 months is shown here: https://documentation.sas.com/?docsetId=etsug&docsetVersion=15.1&docsetTarget=etsug_expand_examples0..., you want a moving median which is MOVMED instaed of MOVAVE.

 

 

--
Paige Miller
Reeza
Super User
So do you want each month to be calculated individually first and then combined to a three month median or should your three month median include the repeat measures as well? The numbers shown in your example are EXACTLY what you expect to have returned?
Reeza
Super User

Convert your months to periods as you indicated, follow the instructions here for that:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 

Then you can use this macro here with a window of 3. There are probably more optimal ways than this but this will work for your example.It assumes your variable name is X. Replace that as needed.

 

proc sql noprint;
select min(period) into :min_period TRIMMED
from have;
select max(period) into :max_period TRIMMED
from have;
quit;

%put &min_period;
%put &max_period;

options nomprint nosymbolgen;

%macro moving_summary(datain=, window=, dataout=);

*remove previous summary table;
proc sql;
drop table &dataout;
quit;

%do i=&min_period. %to %eval(&max_period.-&window.);
proc means data=&datain noprint;
var x;
where period between &i and %eval(&i. + &window. - 1);
output out=_temp n=n_x mean=mean_x std=std_x stderr=stderr_x median = median_x;
run;

data _temp;
set _temp;
start = &i;
end = &i + &window - 1;
run;

proc append base=&dataout data=_temp;
run;

proc sql;
drop table _temp;
quit;

%end;
%mend;

%moving_summary(datain=have, window=6, dataout=summary_stats);

proc means data=have;
where period between 1 and 6;
var x;
run;
mkeintz
PROC Star

@Reeza 's suggestion made me recall a technique I have sometimes used for rolling windows - namely to output a copy of each data record once for each window in which it belongs (and assigning a consistent ending date for each window). 

 

Then run a proc summary or proc means using the window end date as a class var:

 

data have;
  input  date :monyy7. x;
  format date date9.;
datalines;
AUG2015       26
AUG2015       27
SEP2015       16
SEP2015       20
SEP2015        7
OCT2015       22
NOV2015       28
NOV2015       28
NOV2015       35
NOV2015       15
DEC2015       28
DEC2015        8
DEC2015       14
;
data vneed / view=vneed;
  set have nobs=nrecs;
  if _n_=1 then do;
     /*Generate the window end date needed for the 1st 3-month window*/
     set have (keep=date rename=(date=first_date));
     first_date=intnx('month',first_date,2,'end');   

     /*Generate the window end date needed for the last 3-month window */
     set have (keep=date rename=(date=last_date)) point=nrecs;
     last_date=intnx('month',last_date,0,'end');
  end;

  /* Write a record for each window it belongs to */
  do m=0 to 2;  
    window_enddate=intnx('month',date,m,'end');
    if first_date<=window_enddate<=last_date then output;
  end;
  format window_enddate date9.;
run;
proc means data=vneed median ;
  var x;
  class window_enddate;
run;

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1163 views
  • 2 likes
  • 4 in conversation