Hi,
I have a dataset of step functions generated from a survival analysis procedure. There are five groups in the dataset. I would like to generate another column in the dataset to be the average of each step for each group in order to smooth the steps. In the attached dataset the groups are the SUBSTRUCTURE_COND_060 and the step functions data are the SURVIVAL column. The first row in the smoothed column will be equal to the first data point of SURVIVAL column, the second row would be the average of the first row and the second row from SURVIVAL column and the third row to be the average of the second and third column of the SURVIVAL column until the end. I would like to do this for each group. I would be glad if someone could write this code for me.
Thanks,
Thank you for clarifying. Here is the calculation you want. The smoother will still be systematically above the data, but it looks much better:
data Want;
set km;
by substructure_cond_060;
LagS = lag(Survival);
if first.substructure_cond_060 then do;
avg = Survival; cnt = 1;
end;
else do;
avg = mean(LagS, Survival); cnt + 1;
end;
run;
proc sgplot data=Want;
scatter x=cnt y=survival / group=substructure_cond_060;
series x=cnt y=avg / group=substructure_cond_060;
run;
This is called a running mean, which is one kind of moving average. You can compute it in the SAS DATA step by using the FIRST.var technique in a BY-group analysis of the data.
The computation you request will look something like this:
data Want;
set km;
by substructure_cond_060;
if first.substructure_cond_060 then do;
cumSum = Survival; groupCount = 1;
end;
else do;
cumSum + Survival; groupCount + 1;
end;
cumMean = cumSum / groupCount;
run;
However, since the survival curves are monotonic decreasing functions, this will almost surely not "smooth the steps" as you intend. The running average will always be greater than the survival curves. You might want to consider using a centered moving average (maybe using five points) instead of a backward moving average. My article on moving averages contains references to PROC EXPAND and to SAS papers that show how to compute a centered moving average.
@Rick_SAS, Thank you for your wonderful reply. Here is what I want in a SAS code form.
data Want;
set km;
by substructure_cond_060;
if first.substructure_cond_060 then do;
cumSum = Survival;
end;
for First step;
(Suivival 1 + Survival 2)/2; output;
for second step;
(Survival 2+ survival 3)/2; output;
for third step
(Survival 3+ survival 4)/2; output;
until the end.
for last step
(survival (n-1) +Survival (n))/2; output;I see that you summed Cumsum with survival which I dont want to do it. I just want to take the average of each step until the end for each group and output it to other column.
Your input would be appreciated.
Thanks,
Thank you for clarifying. Here is the calculation you want. The smoother will still be systematically above the data, but it looks much better:
data Want;
set km;
by substructure_cond_060;
LagS = lag(Survival);
if first.substructure_cond_060 then do;
avg = Survival; cnt = 1;
end;
else do;
avg = mean(LagS, Survival); cnt + 1;
end;
run;
proc sgplot data=Want;
scatter x=cnt y=survival / group=substructure_cond_060;
series x=cnt y=avg / group=substructure_cond_060;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.