Hello all,
I am facing a tricky operation. To simplify, let's say I have a dataset with 3 columns: Scenario (getting values 1-4), time, and survival. Where time and survival are a result of the lifetest procedure (survival analysis). For each scenario, I ran the lifetest procedure and sent the results to output.
In addition, I have a specified time point, for example, 730 days (two years, my time is in days). I want to do the following thing. I want to cut the data and to keep only the time points up to 730 days. But not so simply. If I have the exact number (730) in the data, I wish to simply take only observations smaller or equal to it. If I don't (this is the tricky part), I wish the take the closest observations below, and the closest obsetvation above (I mean the survival value), to average them, and to add a new observation, with time=730 and survival = the averaged value.
I have no idea how to do it, can you please assist me ? Many thanks in advance !
* I declare a macro variable to make it easier if the cutoff time changes;
%let cutoff=730;
* sort, just to be sure;
proc sort data=have;
by time;
run;
data want;
set have;
old_sur = lag(survival); * preserve the value of the preceding observation;
output;
* since another output statement follows, this one is necessary to
replace the implicit output that normally occurs at the end of a data step
iteration;
if time >= &cutoff
then do;
if time > &cutoff
then do; * we have to build the additional observation;
time = &cutoff;
survival = (survival + old_sur) / 2;
output;
end;
stop; * end the data step once the cutoff value has been reached;
* we don't want to read the rest of the input dataset for nothing;
end;
drop old_sur;
run;
proc print;
run;
Now, we want to get rid of those two observations that were used for the average:
* I declare a macro variable to make it easier if the cutoff time changes;
%let cutoff=730;
* sort, just to be sure;
proc sort data=have;
by time;
run;
%let time1=x;
%let time2=x;
data want;
set have;
old_time = lag(time);
old_sur = lag(survival); * preserve the value of the preceding observation;
output;
* since another output statement follows, this one is necessary to
replace the implicit output that normally occurs at the end of a data step
iteration;
if time >= &cutoff
then do;
if time > &cutoff
then do; * we have to build the additional observation;
* keep the time values of the obs to be deleted;
call symput('time1',put(old_time,best.));
call symput('time2',put(time,best.));
time = &cutoff;
survival = (survival + old_sur) / 2;
output;
end;
stop; * end the data step once the cutoff value has been reached;
* we don't want to read the rest of the input dataset for nothing;
end;
drop old_sur;
run;
* just to check;
%put time1=&time1;
%put time2=&time2;
%macro cleanup;
%if "&time1" ne "x" %then %do; /* conditionally execute code */
data want;
set want;
if time not in (&time1,&time2);
run;
%end;
%mend;
%cleanup;
proc print;
run;
New output:
Obs Time Survival old_time 1 0 1.000 . 2 150 0.970 0 3 150 0.880 150 4 200 0.930 150 5 310 0.790 200 6 393 0.750 310 7 420 0.680 393 8 568 0.610 420 9 730 0.555 639
How about this:
%let cutoff=730;
proc sort data=have;
by time;
run;
data want;
set have;
old_sur = lag(survival);
output;
if survival >= &cutoff
then do;
time = &cutoff;
if survival > &cutoff
then survival = (survival + old_sur) / 2;
output;
stop;
end;
drop old_sur;
run;
Hello. Thank you for the effort. This code does not do what I need. I will give a numerical example, maybe it will be easier to understand what I need exactly.
data temp;
input Time Survival;
datalines;
0 1
150 0.97
200 0.93
150 0.88
310 0.79
393 0.75
420 0.68
568 0.61
639 0.57
792 0.54
880 0.52
930 0.49
1050 0.47
1200 0.35
;
run;
My cutoff is 730 days. If I had 730 days in this data, I want to delete all observations with time larger than 730 (so 730 remains the largest one). If I don't have 730 days (like in this example), I want to take the survival time of the two closest observations (in this case 639 and 792 days - note: it has to be one of each side of 730) and average them: (0.57+0.54)/2, and then I wish to add an observation: 730, 0.555.
Ok, it's clearer now:
data have;
input Time Survival;
datalines;
0 1
150 0.97
200 0.93
150 0.88
310 0.79
393 0.75
420 0.68
568 0.61
639 0.57
792 0.54
880 0.52
930 0.49
1050 0.47
1200 0.35
;
run;
%let cutoff=730;
proc sort data=have;
by time;
run;
data want;
set have;
old_sur = lag(survival);
output;
if time >= &cutoff
then do;
if time > &cutoff
then do;
time = &cutoff;
survival = (survival + old_sur) / 2;
output;
end;
stop;
end;
drop old_sur;
run;
proc print;
run;
Result:
Obs Time Survival 1 0 1.000 2 150 0.970 3 150 0.880 4 200 0.930 5 310 0.790 6 393 0.750 7 420 0.680 8 568 0.610 9 639 0.570 10 792 0.540 11 730 0.555
* I declare a macro variable to make it easier if the cutoff time changes;
%let cutoff=730;
* sort, just to be sure;
proc sort data=have;
by time;
run;
data want;
set have;
old_sur = lag(survival); * preserve the value of the preceding observation;
output;
* since another output statement follows, this one is necessary to
replace the implicit output that normally occurs at the end of a data step
iteration;
if time >= &cutoff
then do;
if time > &cutoff
then do; * we have to build the additional observation;
time = &cutoff;
survival = (survival + old_sur) / 2;
output;
end;
stop; * end the data step once the cutoff value has been reached;
* we don't want to read the rest of the input dataset for nothing;
end;
drop old_sur;
run;
proc print;
run;
Now, we want to get rid of those two observations that were used for the average:
* I declare a macro variable to make it easier if the cutoff time changes;
%let cutoff=730;
* sort, just to be sure;
proc sort data=have;
by time;
run;
%let time1=x;
%let time2=x;
data want;
set have;
old_time = lag(time);
old_sur = lag(survival); * preserve the value of the preceding observation;
output;
* since another output statement follows, this one is necessary to
replace the implicit output that normally occurs at the end of a data step
iteration;
if time >= &cutoff
then do;
if time > &cutoff
then do; * we have to build the additional observation;
* keep the time values of the obs to be deleted;
call symput('time1',put(old_time,best.));
call symput('time2',put(time,best.));
time = &cutoff;
survival = (survival + old_sur) / 2;
output;
end;
stop; * end the data step once the cutoff value has been reached;
* we don't want to read the rest of the input dataset for nothing;
end;
drop old_sur;
run;
* just to check;
%put time1=&time1;
%put time2=&time2;
%macro cleanup;
%if "&time1" ne "x" %then %do; /* conditionally execute code */
data want;
set want;
if time not in (&time1,&time2);
run;
%end;
%mend;
%cleanup;
proc print;
run;
New output:
Obs Time Survival old_time 1 0 1.000 . 2 150 0.970 0 3 150 0.880 150 4 200 0.930 150 5 310 0.790 200 6 393 0.750 310 7 420 0.680 393 8 568 0.610 420 9 730 0.555 639
Yes. With
if time > &cutoff
I prevented the addition of an observation if time equals cutoff, and I also did not set the two macro variables, so they keep their initial value of x.
This then prevents the execution of the data step in the macro.
Just try it with test data, just to be sure.
The first thing that came to my mind was:
- add a WHERE condition so only a certain index is processed at a time, preferably when sorting (and do that into an intermediate dataset).
- encapsulate the whole shebang in a macro defintion that takes one index value as a parameter (use that macro variable in the WHERE)
- call the macro with the indexes wanted
- if number of indexes is sufficiently large, call the macro with call execute from a data step for further automation.
- add a PROC APPEND step within the macro if you want one consolidated output dataset.
Hello Kurt,
I was thinking about your solution, and I don't think I know how to proceed with it. I expect to have around 40 different scenarios, so this macro will need to be called many times.
Wrap the code into a macro, use the variable parts as parameters in the %macro definition. In the code, replace variable parts with the parameters.
Test manually to verify that the macro works.
Write a data step that reads data internally from CARDS; to store the parameters for scenarios.
Run the macro automatically from the data:
data _null_;
set scenarios;
length command $200;
command = '%macroname(' !! trim(param1) !! ',' !! trim(param2) !! ',' !! trim(param3) !! ');';
call execute(command);
run;
Note that it is necessary to quote the macro call with single quotes, so that the macro is not resolved immediately, but after the data step has finished.
I have done something slightly different.
I wrote a macro, with %do loops, something like:
%do i = 1 %to 5;
%do j = 1 to 3;
data have_treatment;
....
where group = treatment;
run;
data have_control;
....
where group = control;
run;
data want_treatment;
........
%end;
%end;
I am not sure if this will even work, but I don't know what to do with the %let time1 = x; Should I define it again for control ? I think that time1 must have some values in now that should be erased.
Your suggested code is nice, but my SAS skill is not good enough to figure it out, unfortunatelly.
Just to make it clear, I have several scenarios by two variables (i and j) and I have two treatments. Eventaually, want_trt and want_cnt need to be appended somehow
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.