Help using Base SAS procedures

missing date imputed using LOCF (Baseline is not carried forward)

Accepted Solution Solved
Reply
Super Contributor
Posts: 301
Accepted Solution

missing date imputed using LOCF (Baseline is not carried forward)

Hi everyone,

I've got this dataset

data new;

     length subjid $8 day $10 value 8;

     input subjid day value;

     datalines;

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

                             *week2 will remain missing;

     2   week4    8

     3  baseline  10

     3  week2      3

      3 week8      4

     ;

     run;

and i need to imputed the missing data using the last observation carrried forward (LOCF) but the baseline data is not carried forward, i.e if week2 is missing , it will be missing (*).


The final dataset I want:

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week4    8

     2   week8     8

     2   week12   8

     3  baseline  10

     3  week2      3

     3 week4     3

      3 week8      4

    3 week12     4


I need some help to modify the below code that  Haikuo wrote (baseline carried forward) to considering now this new condition (baseline data is not carried forward), ...

another alternative code can be useful too...thanks.

proc format;

         value $seq

            'baseline'=1

              'week2'=2

              'week4'=3

              'week8'=4

              'week12'=5

              ;

              value seq

              1='baseline'

              2='week2'

              3='week4'

              4='week8'

              5='week12'

              ;

      run;

  data new;

     length subjid $8 day $10 value 8;

     input subjid day value;

     datalines;

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week2    12

     2   week4    10

     3  baseline  10

     3  week2      3

      3 week8      4

     ;

      data new1;

        set new;

        _day=put(day,$seq.);

        run;

      data want;

        set new1;

        by subjid;

        set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

           if not last.subjid  then do _i=_day to __day-1;

                day=put(_i,seq.);

                output;

             end;

        else do _i=_day to 5;

                day=put(_i,seq.);

                output;

             end;

             drop _:;

      run;

proc print;run;


Thanks.


V





Accepted Solutions
Solution
‎10-18-2012 06:53 PM
PROC Star
Posts: 7,363

Re: missing date imputed using LOCF (Baseline is not carried forward)

Are you looking for?:

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

     if not last.subjid  then do _i=_day to __day-1;

          day=put(_i,seq.);

/* another line changed */

          if not missing(value) then output;

          if _day eq 1 then call missing(value);

       end;

  else do _i=_day to 5;

          day=put(_i,seq.);

          output;

       end;

       drop _:;

run;

View solution in original post


All Replies
PROC Star
Posts: 7,363

Re: missing date imputed using LOCF (Baseline is not carried forward)

I'll look at your code in a couple of minutes, but why not just use data_null_'s suggested code?  i.e.,:

data new;
   length subjid $8 day $10 value 8;
  
input subjid day value;
   datalines;
1  baseline 10
1  week2    12
1  week4    14
1  week8    16
1  week12   12
2  baseline 10
2  week2    12
2  week4    10
3  baseline 10
3  week2     3
3  week8     4
4  baseline 10
4  week8     4
;;;;
   run;
data classdata;
   input day $10.;
  
cards;
baseline
week2
week4
week8
week12
;;;;
   run;
proc summary nway data=new classdata=classdata order=data;
   by subjid;
   class day;
   output out=new2(drop=_freq_ _type_) idgroup(out(value)=);
   run;
data new3;
   update new2(obs=0) new2;
   by subjid;
   output;
  
if first(day) eq 'b' then call missing(of _all_);
   run;

Respected Advisor
Posts: 3,777

Re: missing date imputed using LOCF (Baseline is not carried forward)

If week2 in the "expanded" data is missing the delete it.  I think that's what the OP is asking.

data new;
   length subjid $8 day $10 value 8;
  
input subjid day value;
   datalines;
1  baseline 10
1  week2    12
1  week4    14
1  week8    16
1  week12   12
2  baseline 10
2  week2    12
2  week4    10
3  baseline 10
3  week2     3
3  week8     4
4  baseline 10
4  week8     4
;;;;
   run;
data classdata;
   input day $10.;
  
cards;
baseline
week2
week4
week8
week12
;;;;
   run;
proc summary nway data=new classdata=classdata order=data;
   by subjid;
   class day;
   output out=new2(drop=_freq_ _type_) idgroup(out(_all_)=);
   run;
data new3;
   update new2(obs=0) new2;
   by subjid;
   if day eq 'week2' and missing(value) then delete;
   output;
  
if first(day) eq 'b' then call missing(of _all_);
   run;
PROC Star
Posts: 7,363

Re: missing date imputed using LOCF (Baseline is not carried forward)

To use the code you mentioned, I think you only have to add one line.  i.e.,

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

     if not last.subjid  then do _i=_day to __day-1;

          day=put(_i,seq.);

          output;

/* line to add */

          if _day eq 1 then call missing(value);

       end;

  else do _i=_day to 5;

          day=put(_i,seq.);

          output;

       end;

       drop _:;

run;

Super Contributor
Posts: 301

Re: missing date imputed using LOCF (Baseline is not carried forward)

Thank you Arthur.

That line is very useful, because  in the next step I can remove the record with missing value...not allow in my final dataset.

But, can we put another option rather than your line (call missing (value))..to remove the records?

Your line generate this output:

                                  1      1       baseline      10

                                  2      1       week2         12

                                  3      1       week4         14

                                  4      1       week8         16

                                  5      1       week12        12

                                  6      2       baseline      10

                                  7      2       week2          .

                                  8      2       week4         10

                                  9      2       week8         10

                                 10      2       week12        10

                                 11      3       baseline      10

                                 12      3       week2          3

                                 13      3       week4          3

                                 14      3       week8          4

                                 15      3       week12         4

                                 16      4       baseline      10

                                 17      4       week2          .

                                 18      4       week4          .

                                 19      4       week8          4

                                 20      4       week12         4

but it will be possible remove the record 7, 17 and 18 ...in the Haikuo code?

Many thanks for your help.

V

Solution
‎10-18-2012 06:53 PM
PROC Star
Posts: 7,363

Re: missing date imputed using LOCF (Baseline is not carried forward)

Are you looking for?:

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

     if not last.subjid  then do _i=_day to __day-1;

          day=put(_i,seq.);

/* another line changed */

          if not missing(value) then output;

          if _day eq 1 then call missing(value);

       end;

  else do _i=_day to 5;

          day=put(_i,seq.);

          output;

       end;

       drop _:;

run;

Super Contributor
Posts: 301

Re: missing date imputed using LOCF (Baseline is not carried forward)

Than you very much Arthur.

This is exactly was I looking for.

V.

Super Contributor
Posts: 301

Re: missing date imputed using LOCF (Baseline is not carried forward)

Sorry Data_null and Ksharp very interesting your contribution, i will keepo it in my mind, after finish it...but still I want to end it with the Haikuo code....then

Hi Arthur, You did it for the week post baseline, but now I am interesting only in imputed the last visit, the intermediate weeks as well as the post baseline visit,  if there are missing,

will remaining missing, only i want to impute the last visit...can you do it in this code?... Thnaks.

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

     if not last.subjid  then do _i=_day to __day-1;

          day=put(_i,seq.);

/* another line changed */

          if not missing(value) then output;

          if _day eq 1 then call missing(value);

       end;

  else do _i=_day to 5;

          day=put(_i,seq.);

          output;

       end;

       drop _:;

run;

PROC Star
Posts: 7,363

Re: missing date imputed using LOCF (Baseline is not carried forward)

Not sure what you are asking for.  Given your example data, what result do you want to achieve?

Super Contributor
Posts: 301

Re: missing date imputed using LOCF (Baseline is not carried forward)

Initial code, developed to hikuo and moddified for you, missing visit is imputed as the LOCF, but the baseline is not carried forward , i.e week2 if it is missing reamining missing in subject2, and week2, week4 willremaining missing in subject3:

proc format;

         value $seq

            'baseline'=1

              'week2'=2

              'week4'=3

              'week8'=4

              'week12'=5

              ;

              value seq

              1='baseline'

              2='week2'

              3='week4'

              4='week8'

              5='week12'

              ;

      run;

  data new;

     length subjid $8 day $10 value 8;

     input subjid day value;

     datalines;

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week4    12

     3  baseline  10

     3 week8      4

     ;




      data new1;

        set new;

        _day=put(day,$seq.);

        run;

data new1;

        set new;

        _day=put(day,$seq.);

        run;


      data want;

        set new1;

        by subjid;

        set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

           if not last.subjid  then do _i=_day to __day-1;

                day=put(_i,seq.);

                 if not missing(value) then output;

          if _day = 1 then call missing(value);

             end;

        else do _i=_day to 5;

                day=put(_i,seq.);

                output;

             end;

             drop _:;

run;

this is the dataset:

                                  1      1       baseline      10

                                  2      1       week2         12

                                  3      1       week4         14

                                  4      1       week8         16

                                  5      1       week12        12

                                  6      2       baseline      10

                                  7      2       week4         12

                                  8      2       week8         12

                                  9      2       week12        12

                                 10      3       baseline      10

                                 11      3       week8          4

                                 12      3       week12         4

Now, my problem, is that only I am interested to impute the last visit, i.e i need to ridd off week8, because it was a visit missing in the original dataset, in with this condition, because it is not a last visit, it will remain missing...then my finaldataset ,  modifying the above code , have to be like that:

                                  1      1       baseline      10 

                                  2      1       week2         12

                                  3      1       week4         14

                                  4      1       week8         16

                                  5      1       week12        12

                                  6      2       baseline      10

                                  7      2       week4         12

                                  8      2       week12        12

                                  9      3       baseline      10

                                 10      3       week8          4

                                 11      3       week12         4

Thnak you.

PROC Star
Posts: 7,363

Re: missing date imputed using LOCF (Baseline is not carried forward)

I'm SURE this isn't the most efficient way to code this but, keeping your current code, how about:

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

  if not last.subjid  then do _i=_day to __day-1;

    day=put(_i,seq.);

    if not missing(value) and (_i eq _day or _i eq __day-1) then output;

    if _day = 1 then call missing(value);

  end;

  else do _i=_day to 5;

    day=put(_i,seq.);

    if _i eq _day or _i eq 5 then output;

  end;

  drop _:;

run;

Super Contributor
Posts: 301

Re: missing date imputed using LOCF (Baseline is not carried forward)

Thank you brilliant, It works.

In the mean time, I got the easy way, flagging the imputed data, and keeping only the last record Smiley Happy

But, yours is much better, thank you again Arthur.

*Flaggin the imput data and keeping the last visit;

proc sort data=new out= newtest;
  by subjid day;
run;

proc sort data=want out=wanttest;
by subjid day;
run;

*flagging the imputed data (LOCF);

data locftest;
  merge newtest(in=a) wanttest(in=b);
  by subjid day;
  if b and not a then do;
  locf=1;
  output;
  end;
  else do;
  locf=0;
  output;
  end;
run;

data locf;
  set locftest;
if upcase(day)='BASELINE' then weekno=0;

      else if upcase(day)='WEEK2' then weekno=2;

      else if upcase(day)='WEEK4' then weekno=4;

      else if upcase(day)='WEEK8' then weekno=8;

      else if upcase(day)='WEEK12' then weekno=12;

     

run;

proc sort data=locf;
by subjid weekno;
run;


data locfok;
  set locf;
  by subjid weekno;
  if locf=1 and not last.subjid then delete;

 
run;

Super Contributor
Posts: 301

Re: missing date imputed using LOCF (Baseline is not carried forward)

Hi again!...this problem make me nuts......is the last vistiS, not only the last visit...i Explain:

data new; 

     length subjid $8 day $10 value 8;

     input subjid day value;

     datalines;

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week4    12

     3  baseline  10

     3  week2      4

     3  week8      2

     ;

The new code need to imputed the only the last visits:

i am looking for this want dataset:

   1   baseline 10 

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week4    12

     2  week8     12

     2  week12     12

     3  baseline  10

     3  week2      4

     3  week8      2

     3  week12   2

Can you help with this last one request!

Thnaks Arthur.

PROC Star
Posts: 7,363

Re: missing date imputed using LOCF (Baseline is not carried forward)

Both your data and criteria keep changing.  Are you sure you know what you are trying to achieve?:

proc format;

         value $seq

            'baseline'=1

              'week2'=2

              'week4'=3

              'week8'=4

              'week12'=5

              ;

              value seq

              1='baseline'

              2='week2'

              3='week4'

              4='week8'

              5='week12'

              ;

run;

data new;

   length subjid $8 day $10 value 8;

   input subjid day value;

   datalines;

   1   baseline 10

   1   week2   12

   1   week4   14

   1   week8   16

   1   week12  12

   2   baseline 10

   2   week4    12

   3   baseline  10

   3   week2      4

   3   week8      2

   ;

data new1;

  set new;

  _day=put(day,$seq.);

run;

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

  if not last.subjid  then do _i=_day to __day-1;

    day=put(_i,seq.);

    if not missing(value) and (_i  eq _day or _i eq 4 or _i eq 5) then output;

    *if _day = 1 then call missing(value);

  end;

  else do _i=_day to 5;

    day=put(_i,seq.);

    if _i eq _day or _i eq 4 or _i eq 5 then output;

  end;

  drop _:;

run;

Super Contributor
Posts: 301

Re: missing date imputed using LOCF (Baseline is not carried forward)

Hi Arthur, your last post did not come out in my email Smiley Happy

Yes, this is the last request. Thank you.

I will try later if it works.

Thnak you again for your time.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 19 replies
  • 1080 views
  • 7 likes
  • 5 in conversation