DATA Step, Macro, Functions and more

create missing ids and to retain the previous value

Reply
Contributor
Posts: 38

create missing ids and to retain the previous value

I am trying to manipulate the output dataset from survival analysis (used Proc LifeTest). Below is the layout of my output. I have 13 age groups, survival months (0 to n), probability values. The survival months column has two problems: (1) it has missing months e.g. age <13 has 0-5 then 7-9 and so on (2) n is not constant for all age groups.

AGE SURMOS SURVIVAL
G1 0 0.1
G1 1 0.2
G1 3 0.6
G1 6 0.12
G2 0 0.01
G2 1 0.6
G2 5 0.12
G2 6 0.33
G2 10 0.88

I am trying to create a dataset (1) with 0 to n months (whatever is maximum for that particular age group) without missing for all age groups (2) to use the previous row value for probability for the missing. Below if the desired output I am trying to achieve.

AGE SURMOS SURVIVAL
G1 0 0.1
G1 1 0.2
G1 2 0.2
G1 3 0.6
G1 4 0.6
G1 5 0.6
G1 6 0.12
G2 0 0.01
G2 1 0.6
G2 2 0.6
G2 3 0.6
G2 4 0.6
G2 5 0.12
G2 6 0.33
G2 7 0.33
G2 8 0.33
G2 9 0.33
G2 10 0.88
SAS Employee
Posts: 27

Re: create missing ids and to retain the previous value

You could try something like this.

[pre]
data test;
input age $ surmos survival;
datalines;
G1 0 0.1
G1 1 0.2
G1 3 0.6
G1 6 0.12
G2 0 0.01
G2 1 0.6
G2 5 0.12
G2 6 0.33
G2 10 0.88
;
run;

proc sort data=test;
by age surmos;
run;

data test2 (keep = age surmos survival);
set test;
by age;
retain last_surmos last_survival;
if first.age then do;
output;
last_surmos = surmos;
last_survival = survival;
end;
else do;
if ((surmos - last_surmos) LE 1) then do;
output;
last_surmos = surmos;
last_survival = survival;
end;
else do;
current_surmos = surmos;
current_survival = survival;
do i = (last_surmos + 1) to (surmos - 1);
last_surmos = last_surmos + 1;
surmos = last_surmos;
survival = last_survival;
output;
end;
surmos = current_surmos;
survival = current_survival;
output;
last_surmos = surmos;
last_survival = survival;
end;
end;
run;

[/pre] Message was edited by: Daryl
Contributor
Posts: 38

Re: create missing ids and to retain the previous value

Thanks Daryl. Will try it.
Respected Advisor
Posts: 3,892

Re: create missing ids and to retain the previous value

Hi

Same result just a bit a different coding approach taken in the data step:

data have;
input age $ surmos survival;
datalines;
G1 0 0.1
G1 1 0.2
G1 3 0.6
G1 6 0.12
G2 0 0.01
G2 1 0.6
G2 5 0.12
G2 6 0.33
G2 10 0.88
;
run;

proc sort data=have;
by age surmos;
run;

data want(drop=HaveSurmos LagSurvival HaveSurvival);
set have(rename=(surmos=HaveSurmos survival=HaveSurvival));
by age;

LagSurvival= lag(HaveSurvival);

if first.age then Surmos=HaveSurmos;

do while(Surmos le HaveSurmos);
if Surmos = HaveSurmos then survival=HaveSurvival;
else survival=LagSurvival;
output;
Surmos+1;
end;

run;

proc print data=want;
run;

HTH
Patrick

Message was edited by: Patrick
Super User
Posts: 9,681

Re: create missing ids and to retain the previous value

Yes.I also get it.But Patrick's code is more clever about rename origin variable.
[pre]
data test;
input age $ surmos survival;
datalines;
G1 0 0.1
G1 1 0.2
G1 3 0.6
G1 6 0.12
G2 0 0.01
G2 1 0.6
G2 5 0.12
G2 6 0.33
G2 10 0.88
;
run;
data temp;
set test;
retain _surmos _survival;
if age ne lag(age) then _surmos = surmos;
_surmos+1;
if _surmos lt surmos then do;
do while(_surmos lt surmos);
output;
_surmos+1;
end;
end;
_survival=survival; _surmos=surmos;
drop survival surmos;
output;
run;
proc print noobs;
run;
[/pre]



Ksharp
Contributor
Posts: 38

Re: create missing ids and to retain the previous value

Thanks to all who responded. Below is another alternative solution.

data test;
input age $ surmos survival;
datalines;
G1 0 0.1
G1 1 0.2
G1 3 0.6
G1 6 0.12
G2 0 0.01
G2 1 0.6
G2 5 0.12
G2 6 0.33
G2 10 0.88
;
run;

proc sort data= TEST; by AGE ; run;

/*gives max. survival months for each group variables*/
proc means data=outs noprint max ;
by AGE ;
var surmos ;
where _censor_=0;
output out= maxmos(drop=_type_ _freq_) max = maxmos ;
run ;

/*adds missing months for each level in the group variable*/
data temp1;
set maxmos;
if maxmos=0.5 then do;
surmos=0; output;
surmos=maxmos; output;
end;
else do;
surmos=0; output;
surmos=0.5; output;
do surmos= 1 to maxmos; output;
end;
end;
drop maxmos;
run;

proc sort data=temp1;
by AGE surmos;
run;

data temp2;
merge temp1 outs;
by AGE surmos;

retain s1 ;
if surmos=0 then do;
s1=1; slcl=1; sucl=1; end;

else if survival ne . then do;
s1=survival;
end;
run;
Respected Advisor
Posts: 3,777

Re: create missing ids and to retain the previous value

I think all you need is to look ahead to the next SURMOS.

[pre]
data test;
input age $ surmos survival;
datalines;
G1 0 0.1
G1 1 0.2
G1 3 0.6
G1 6 0.12
G2 0 0.01
G2 1 0.6
G2 5 0.12
G2 6 0.33
G2 10 0.88
;;;;
run;
data new;
set test end=eof;
by age;
if not eof then set test(firstobs=2 keep=surmos rename=(surmos=nxs));
drop nxs;
if not last.age then do;
do surmos=surmos to nxs-1;
output;
end;
end;
else output;
run;
proc print;
run;
[/pre]
Ask a Question
Discussion stats
  • 6 replies
  • 156 views
  • 0 likes
  • 5 in conversation