Hi everyone,
I've got this dataset
data new;
length subjid $8 day $10 value 8;
input subjid day value;
datalines;
1 baseline 10
1 week2 12
1 week4 14
1 week8 16
1 week12 12
2 baseline 10
*week2 will remain missing;
2 week4 8
3 baseline 10
3 week2 3
3 week8 4
;
run;
and i need to imputed the missing data using the last observation carrried forward (LOCF) but the baseline data is not carried forward, i.e if week2 is missing , it will be missing (*).
The final dataset I want:
1 baseline 10
1 week2 12
1 week4 14
1 week8 16
1 week12 12
2 baseline 10
2 week4 8
2 week8 8
2 week12 8
3 baseline 10
3 week2 3
3 week4 3
3 week8 4
3 week12 4
I need some help to modify the below code that Haikuo wrote (baseline carried forward) to considering now this new condition (baseline data is not carried forward), ...
another alternative code can be useful too...thanks.
proc format;
value $seq
'baseline'=1
'week2'=2
'week4'=3
'week8'=4
'week12'=5
;
value seq
1='baseline'
2='week2'
3='week4'
4='week8'
5='week12'
;
run;
data new;
length subjid $8 day $10 value 8;
input subjid day value;
datalines;
1 baseline 10
1 week2 12
1 week4 14
1 week8 16
1 week12 12
2 baseline 10
2 week2 12
2 week4 10
3 baseline 10
3 week2 3
3 week8 4
;
data new1;
set new;
_day=put(day,$seq.);
run;
data want;
set new1;
by subjid;
set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);
if not last.subjid then do _i=_day to __day-1;
day=put(_i,seq.);
output;
end;
else do _i=_day to 5;
day=put(_i,seq.);
output;
end;
drop _:;
run;
proc print;run;
Thanks.
V
Are you looking for?:
data want;
set new1;
by subjid;
set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);
if not last.subjid then do _i=_day to __day-1;
day=put(_i,seq.);
/* another line changed */
if not missing(value) then output;
if _day eq 1 then call missing(value);
end;
else do _i=_day to 5;
day=put(_i,seq.);
output;
end;
drop _:;
run;
I'll look at your code in a couple of minutes, but why not just use data_null_'s suggested code? i.e.,:
data new;
length subjid $8 day $10 value 8;
input subjid day value;
datalines;
1 baseline 10
1 week2 12
1 week4 14
1 week8 16
1 week12 12
2 baseline 10
2 week2 12
2 week4 10
3 baseline 10
3 week2 3
3 week8 4
4 baseline 10
4 week8 4
;;;;
run;
data classdata;
input day $10.;
cards;
baseline
week2
week4
week8
week12
;;;;
run;
proc summary nway data=new classdata=classdata order=data;
by subjid;
class day;
output out=new2(drop=_freq_ _type_) idgroup(out(value)=);
run;
data new3;
update new2(obs=0) new2;
by subjid;
output;
if first(day) eq 'b' then call missing(of _all_);
run;
If week2 in the "expanded" data is missing the delete it. I think that's what the OP is asking.
To use the code you mentioned, I think you only have to add one line. i.e.,
data want;
set new1;
by subjid;
set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);
if not last.subjid then do _i=_day to __day-1;
day=put(_i,seq.);
output;
/* line to add */
if _day eq 1 then call missing(value);
end;
else do _i=_day to 5;
day=put(_i,seq.);
output;
end;
drop _:;
run;
Thank you Arthur.
That line is very useful, because in the next step I can remove the record with missing value...not allow in my final dataset.
But, can we put another option rather than your line (call missing (value))..to remove the records?
Your line generate this output:
1 1 baseline 10
2 1 week2 12
3 1 week4 14
4 1 week8 16
5 1 week12 12
6 2 baseline 10
7 2 week2 .
8 2 week4 10
9 2 week8 10
10 2 week12 10
11 3 baseline 10
12 3 week2 3
13 3 week4 3
14 3 week8 4
15 3 week12 4
16 4 baseline 10
17 4 week2 .
18 4 week4 .
19 4 week8 4
20 4 week12 4
but it will be possible remove the record 7, 17 and 18 ...in the Haikuo code?
Many thanks for your help.
V
Are you looking for?:
data want;
set new1;
by subjid;
set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);
if not last.subjid then do _i=_day to __day-1;
day=put(_i,seq.);
/* another line changed */
if not missing(value) then output;
if _day eq 1 then call missing(value);
end;
else do _i=_day to 5;
day=put(_i,seq.);
output;
end;
drop _:;
run;
Than you very much Arthur.
This is exactly was I looking for.
V.
Sorry Data_null and Ksharp very interesting your contribution, i will keepo it in my mind, after finish it...but still I want to end it with the Haikuo code....then
Hi Arthur, You did it for the week post baseline, but now I am interesting only in imputed the last visit, the intermediate weeks as well as the post baseline visit, if there are missing,
will remaining missing, only i want to impute the last visit...can you do it in this code?... Thnaks.
data want;
set new1;
by subjid;
set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);
if not last.subjid then do _i=_day to __day-1;
day=put(_i,seq.);
/* another line changed */
if not missing(value) then output;
if _day eq 1 then call missing(value);
end;
else do _i=_day to 5;
day=put(_i,seq.);
output;
end;
drop _:;
run;
Initial code, developed to hikuo and moddified for you, missing visit is imputed as the LOCF, but the baseline is not carried forward , i.e week2 if it is missing reamining missing in subject2, and week2, week4 willremaining missing in subject3:
proc format;
value $seq
'baseline'=1
'week2'=2
'week4'=3
'week8'=4
'week12'=5
;
value seq
1='baseline'
2='week2'
3='week4'
4='week8'
5='week12'
;
run;
data new;
length subjid $8 day $10 value 8;
input subjid day value;
datalines;
1 baseline 10
1 week2 12
1 week4 14
1 week8 16
1 week12 12
2 baseline 10
2 week4 12
3 baseline 10
3 week8 4
;
data new1;
set new;
_day=put(day,$seq.);
run;
data new1;
set new;
_day=put(day,$seq.);
run;
data want;
set new1;
by subjid;
set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);
if not last.subjid then do _i=_day to __day-1;
day=put(_i,seq.);
if not missing(value) then output;
if _day = 1 then call missing(value);
end;
else do _i=_day to 5;
day=put(_i,seq.);
output;
end;
drop _:;
run;
this is the dataset:
1 1 baseline 10
2 1 week2 12
3 1 week4 14
4 1 week8 16
5 1 week12 12
6 2 baseline 10
7 2 week4 12
8 2 week8 12
9 2 week12 12
10 3 baseline 10
11 3 week8 4
12 3 week12 4
Now, my problem, is that only I am interested to impute the last visit, i.e i need to ridd off week8, because it was a visit missing in the original dataset, in with this condition, because it is not a last visit, it will remain missing...then my finaldataset , modifying the above code , have to be like that:
1 1 baseline 10
2 1 week2 12
3 1 week4 14
4 1 week8 16
5 1 week12 12
6 2 baseline 10
7 2 week4 12
8 2 week12 12
9 3 baseline 10
10 3 week8 4
11 3 week12 4
Thnak you.
I'm SURE this isn't the most efficient way to code this but, keeping your current code, how about:
data want;
set new1;
by subjid;
set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);
if not last.subjid then do _i=_day to __day-1;
day=put(_i,seq.);
if not missing(value) and (_i eq _day or _i eq __day-1) then output;
if _day = 1 then call missing(value);
end;
else do _i=_day to 5;
day=put(_i,seq.);
if _i eq _day or _i eq 5 then output;
end;
drop _:;
run;
Thank you brilliant, It works.
In the mean time, I got the easy way, flagging the imputed data, and keeping only the last record
But, yours is much better, thank you again Arthur.
*Flaggin the imput data and keeping the last visit;
proc sort data=new out= newtest;
by subjid day;
run;
proc sort data=want out=wanttest;
by subjid day;
run;
*flagging the imputed data (LOCF);
data locftest;
merge newtest(in=a) wanttest(in=b);
by subjid day;
if b and not a then do;
locf=1;
output;
end;
else do;
locf=0;
output;
end;
run;
data locf;
set locftest;
if upcase(day)='BASELINE' then weekno=0;
else if upcase(day)='WEEK2' then weekno=2;
else if upcase(day)='WEEK4' then weekno=4;
else if upcase(day)='WEEK8' then weekno=8;
else if upcase(day)='WEEK12' then weekno=12;
run;
proc sort data=locf;
by subjid weekno;
run;
data locfok;
set locf;
by subjid weekno;
if locf=1 and not last.subjid then delete;
run;
Hi again!...this problem make me nuts......is the last vistiS, not only the last visit...i Explain:
data new;
length subjid $8 day $10 value 8;
input subjid day value;
datalines;
1 baseline 10
1 week2 12
1 week4 14
1 week8 16
1 week12 12
2 baseline 10
2 week4 12
3 baseline 10
3 week2 4
3 week8 2
;
The new code need to imputed the only the last visits:
i am looking for this want dataset:
1 baseline 10
1 week2 12
1 week4 14
1 week8 16
1 week12 12
2 baseline 10
2 week4 12
2 week8 12
2 week12 12
3 baseline 10
3 week2 4
3 week8 2
3 week12 2
Can you help with this last one request!
Thnaks Arthur.
Both your data and criteria keep changing. Are you sure you know what you are trying to achieve?:
proc format;
value $seq
'baseline'=1
'week2'=2
'week4'=3
'week8'=4
'week12'=5
;
value seq
1='baseline'
2='week2'
3='week4'
4='week8'
5='week12'
;
run;
data new;
length subjid $8 day $10 value 8;
input subjid day value;
datalines;
1 baseline 10
1 week2 12
1 week4 14
1 week8 16
1 week12 12
2 baseline 10
2 week4 12
3 baseline 10
3 week2 4
3 week8 2
;
data new1;
set new;
_day=put(day,$seq.);
run;
data want;
set new1;
by subjid;
set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);
if not last.subjid then do _i=_day to __day-1;
day=put(_i,seq.);
if not missing(value) and (_i eq _day or _i eq 4 or _i eq 5) then output;
*if _day = 1 then call missing(value);
end;
else do _i=_day to 5;
day=put(_i,seq.);
if _i eq _day or _i eq 4 or _i eq 5 then output;
end;
drop _:;
run;
Hi Arthur, your last post did not come out in my email
Yes, this is the last request. Thank you.
I will try later if it works.
Thnak you again for your time.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.