BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mmhxc5
Quartz | Level 8

Hi,

I have a dataset of step functions generated from a survival analysis procedure. There are five groups in the dataset. I would like to generate another column in the dataset to be the average of each step for each group in order to smooth the steps. In the attached dataset the groups are the SUBSTRUCTURE_COND_060 and the step functions data are the SURVIVAL column. The first row in the smoothed column will be equal to the first data point of SURVIVAL column, the second row would be the average of the first row and the second row from SURVIVAL column and the third row to be the average of the second and third column of the SURVIVAL column until the end. I would like to do this for each group. I would be glad if someone could write this code for me.

 

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Thank you for clarifying. Here is the calculation you want. The smoother will still be systematically above the data, but it looks much better:

 

data Want;
set km;
by substructure_cond_060;
LagS = lag(Survival);
if first.substructure_cond_060 then do;
   avg = Survival;  cnt = 1;
   end;
else do;
   avg = mean(LagS, Survival);  cnt + 1;
   end;
run;

proc sgplot data=Want;
scatter x=cnt y=survival / group=substructure_cond_060;
series x=cnt y=avg / group=substructure_cond_060;
run;

View solution in original post

3 REPLIES 3
Rick_SAS
SAS Super FREQ

This is called a running mean, which is one kind of moving average. You can compute it in the SAS DATA step by using the FIRST.var technique in a BY-group analysis of the data.

 

The computation you request will look something like this:

 

data Want;
set km;
by substructure_cond_060;
if first.substructure_cond_060 then do;
   cumSum = Survival; groupCount = 1; 
   end;
else do;
   cumSum + Survival; groupCount + 1; 
   end;
cumMean = cumSum / groupCount;
run;

However, since the survival curves are monotonic decreasing functions, this will almost surely not "smooth the steps" as you intend. The running average will always be greater than the survival curves. You might want to consider using a centered moving average (maybe using five points) instead of a backward moving average. My article on moving averages contains references to PROC EXPAND and to SAS papers that show how to compute a centered moving average.

mmhxc5
Quartz | Level 8

@Rick_SAS, Thank you for your wonderful reply. Here is what I want in a SAS code form.

 

data Want;
set km;
by substructure_cond_060;
if first.substructure_cond_060 then do;
   cumSum = Survival;
   end;
for First step;
   (Suivival 1 + Survival 2)/2; output;
for second step;
  (Survival 2+ survival 3)/2; output;
for third step
 (Survival 3+ survival 4)/2; output;
until the end.
for last step
(survival (n-1) +Survival (n))/2; output;

I see that you summed Cumsum with survival which I dont want to do it. I just want to take the average of each step until the end for each group and output it to other column.

Your input would be appreciated.

Thanks,

Rick_SAS
SAS Super FREQ

Thank you for clarifying. Here is the calculation you want. The smoother will still be systematically above the data, but it looks much better:

 

data Want;
set km;
by substructure_cond_060;
LagS = lag(Survival);
if first.substructure_cond_060 then do;
   avg = Survival;  cnt = 1;
   end;
else do;
   avg = mean(LagS, Survival);  cnt + 1;
   end;
run;

proc sgplot data=Want;
scatter x=cnt y=survival / group=substructure_cond_060;
series x=cnt y=avg / group=substructure_cond_060;
run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1403 views
  • 1 like
  • 2 in conversation