BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michtka
Fluorite | Level 6

Hi everyone,

I've got this dataset

data new;

     length subjid $8 day $10 value 8;

     input subjid day value;

     datalines;

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

                             *week2 will remain missing;

     2   week4    8

     3  baseline  10

     3  week2      3

      3 week8      4

     ;

     run;

and i need to imputed the missing data using the last observation carrried forward (LOCF) but the baseline data is not carried forward, i.e if week2 is missing , it will be missing (*).


The final dataset I want:

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week4    8

     2   week8     8

     2   week12   8

     3  baseline  10

     3  week2      3

     3 week4     3

      3 week8      4

    3 week12     4


I need some help to modify the below code that  Haikuo wrote (baseline carried forward) to considering now this new condition (baseline data is not carried forward), ...

another alternative code can be useful too...thanks.

proc format;

         value $seq

            'baseline'=1

              'week2'=2

              'week4'=3

              'week8'=4

              'week12'=5

              ;

              value seq

              1='baseline'

              2='week2'

              3='week4'

              4='week8'

              5='week12'

              ;

      run;

  data new;

     length subjid $8 day $10 value 8;

     input subjid day value;

     datalines;

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week2    12

     2   week4    10

     3  baseline  10

     3  week2      3

      3 week8      4

     ;

      data new1;

        set new;

        _day=put(day,$seq.);

        run;

      data want;

        set new1;

        by subjid;

        set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

           if not last.subjid  then do _i=_day to __day-1;

                day=put(_i,seq.);

                output;

             end;

        else do _i=_day to 5;

                day=put(_i,seq.);

                output;

             end;

             drop _:;

      run;

proc print;run;


Thanks.


V




1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Are you looking for?:

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

     if not last.subjid  then do _i=_day to __day-1;

          day=put(_i,seq.);

/* another line changed */

          if not missing(value) then output;

          if _day eq 1 then call missing(value);

       end;

  else do _i=_day to 5;

          day=put(_i,seq.);

          output;

       end;

       drop _:;

run;

View solution in original post

19 REPLIES 19
art297
Opal | Level 21

I'll look at your code in a couple of minutes, but why not just use data_null_'s suggested code?  i.e.,:

data new;
   length subjid $8 day $10 value 8;
  
input subjid day value;
   datalines;
1  baseline 10
1  week2    12
1  week4    14
1  week8    16
1  week12   12
2  baseline 10
2  week2    12
2  week4    10
3  baseline 10
3  week2     3
3  week8     4
4  baseline 10
4  week8     4
;;;;
   run;
data classdata;
   input day $10.;
  
cards;
baseline
week2
week4
week8
week12
;;;;
   run;
proc summary nway data=new classdata=classdata order=data;
   by subjid;
   class day;
   output out=new2(drop=_freq_ _type_) idgroup(out(value)=);
   run;
data new3;
   update new2(obs=0) new2;
   by subjid;
   output;
  
if first(day) eq 'b' then call missing(of _all_);
   run;

data_null__
Jade | Level 19

If week2 in the "expanded" data is missing the delete it.  I think that's what the OP is asking.

data new;
   length subjid $8 day $10 value 8;
  
input subjid day value;
   datalines;
1  baseline 10
1  week2    12
1  week4    14
1  week8    16
1  week12   12
2  baseline 10
2  week2    12
2  week4    10
3  baseline 10
3  week2     3
3  week8     4
4  baseline 10
4  week8     4
;;;;
   run;
data classdata;
   input day $10.;
  
cards;
baseline
week2
week4
week8
week12
;;;;
   run;
proc summary nway data=new classdata=classdata order=data;
   by subjid;
   class day;
   output out=new2(drop=_freq_ _type_) idgroup(out(_all_)=);
   run;
data new3;
   update new2(obs=0) new2;
   by subjid;
   if day eq 'week2' and missing(value) then delete;
   output;
  
if first(day) eq 'b' then call missing(of _all_);
   run;
art297
Opal | Level 21

To use the code you mentioned, I think you only have to add one line.  i.e.,

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

     if not last.subjid  then do _i=_day to __day-1;

          day=put(_i,seq.);

          output;

/* line to add */

          if _day eq 1 then call missing(value);

       end;

  else do _i=_day to 5;

          day=put(_i,seq.);

          output;

       end;

       drop _:;

run;

michtka
Fluorite | Level 6

Thank you Arthur.

That line is very useful, because  in the next step I can remove the record with missing value...not allow in my final dataset.

But, can we put another option rather than your line (call missing (value))..to remove the records?

Your line generate this output:

                                  1      1       baseline      10

                                  2      1       week2         12

                                  3      1       week4         14

                                  4      1       week8         16

                                  5      1       week12        12

                                  6      2       baseline      10

                                  7      2       week2          .

                                  8      2       week4         10

                                  9      2       week8         10

                                 10      2       week12        10

                                 11      3       baseline      10

                                 12      3       week2          3

                                 13      3       week4          3

                                 14      3       week8          4

                                 15      3       week12         4

                                 16      4       baseline      10

                                 17      4       week2          .

                                 18      4       week4          .

                                 19      4       week8          4

                                 20      4       week12         4

but it will be possible remove the record 7, 17 and 18 ...in the Haikuo code?

Many thanks for your help.

V

art297
Opal | Level 21

Are you looking for?:

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

     if not last.subjid  then do _i=_day to __day-1;

          day=put(_i,seq.);

/* another line changed */

          if not missing(value) then output;

          if _day eq 1 then call missing(value);

       end;

  else do _i=_day to 5;

          day=put(_i,seq.);

          output;

       end;

       drop _:;

run;

michtka
Fluorite | Level 6

Than you very much Arthur.

This is exactly was I looking for.

V.

michtka
Fluorite | Level 6

Sorry Data_null and Ksharp very interesting your contribution, i will keepo it in my mind, after finish it...but still I want to end it with the Haikuo code....then

Hi Arthur, You did it for the week post baseline, but now I am interesting only in imputed the last visit, the intermediate weeks as well as the post baseline visit,  if there are missing,

will remaining missing, only i want to impute the last visit...can you do it in this code?... Thnaks.

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

     if not last.subjid  then do _i=_day to __day-1;

          day=put(_i,seq.);

/* another line changed */

          if not missing(value) then output;

          if _day eq 1 then call missing(value);

       end;

  else do _i=_day to 5;

          day=put(_i,seq.);

          output;

       end;

       drop _:;

run;

art297
Opal | Level 21

Not sure what you are asking for.  Given your example data, what result do you want to achieve?

michtka
Fluorite | Level 6

Initial code, developed to hikuo and moddified for you, missing visit is imputed as the LOCF, but the baseline is not carried forward , i.e week2 if it is missing reamining missing in subject2, and week2, week4 willremaining missing in subject3:

proc format;

         value $seq

            'baseline'=1

              'week2'=2

              'week4'=3

              'week8'=4

              'week12'=5

              ;

              value seq

              1='baseline'

              2='week2'

              3='week4'

              4='week8'

              5='week12'

              ;

      run;

  data new;

     length subjid $8 day $10 value 8;

     input subjid day value;

     datalines;

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week4    12

     3  baseline  10

     3 week8      4

     ;




      data new1;

        set new;

        _day=put(day,$seq.);

        run;

data new1;

        set new;

        _day=put(day,$seq.);

        run;


      data want;

        set new1;

        by subjid;

        set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

           if not last.subjid  then do _i=_day to __day-1;

                day=put(_i,seq.);

                 if not missing(value) then output;

          if _day = 1 then call missing(value);

             end;

        else do _i=_day to 5;

                day=put(_i,seq.);

                output;

             end;

             drop _:;

run;

this is the dataset:

                                  1      1       baseline      10

                                  2      1       week2         12

                                  3      1       week4         14

                                  4      1       week8         16

                                  5      1       week12        12

                                  6      2       baseline      10

                                  7      2       week4         12

                                  8      2       week8         12

                                  9      2       week12        12

                                 10      3       baseline      10

                                 11      3       week8          4

                                 12      3       week12         4

Now, my problem, is that only I am interested to impute the last visit, i.e i need to ridd off week8, because it was a visit missing in the original dataset, in with this condition, because it is not a last visit, it will remain missing...then my finaldataset ,  modifying the above code , have to be like that:

                                  1      1       baseline      10 

                                  2      1       week2         12

                                  3      1       week4         14

                                  4      1       week8         16

                                  5      1       week12        12

                                  6      2       baseline      10

                                  7      2       week4         12

                                  8      2       week12        12

                                  9      3       baseline      10

                                 10      3       week8          4

                                 11      3       week12         4

Thnak you.

art297
Opal | Level 21

I'm SURE this isn't the most efficient way to code this but, keeping your current code, how about:

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

  if not last.subjid  then do _i=_day to __day-1;

    day=put(_i,seq.);

    if not missing(value) and (_i eq _day or _i eq __day-1) then output;

    if _day = 1 then call missing(value);

  end;

  else do _i=_day to 5;

    day=put(_i,seq.);

    if _i eq _day or _i eq 5 then output;

  end;

  drop _:;

run;

michtka
Fluorite | Level 6

Thank you brilliant, It works.

In the mean time, I got the easy way, flagging the imputed data, and keeping only the last record Smiley Happy

But, yours is much better, thank you again Arthur.

*Flaggin the imput data and keeping the last visit;

proc sort data=new out= newtest;
  by subjid day;
run;

proc sort data=want out=wanttest;
by subjid day;
run;

*flagging the imputed data (LOCF);

data locftest;
  merge newtest(in=a) wanttest(in=b);
  by subjid day;
  if b and not a then do;
  locf=1;
  output;
  end;
  else do;
  locf=0;
  output;
  end;
run;

data locf;
  set locftest;
if upcase(day)='BASELINE' then weekno=0;

      else if upcase(day)='WEEK2' then weekno=2;

      else if upcase(day)='WEEK4' then weekno=4;

      else if upcase(day)='WEEK8' then weekno=8;

      else if upcase(day)='WEEK12' then weekno=12;

     

run;

proc sort data=locf;
by subjid weekno;
run;


data locfok;
  set locf;
  by subjid weekno;
  if locf=1 and not last.subjid then delete;

 
run;

michtka
Fluorite | Level 6

Hi again!...this problem make me nuts......is the last vistiS, not only the last visit...i Explain:

data new; 

     length subjid $8 day $10 value 8;

     input subjid day value;

     datalines;

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week4    12

     3  baseline  10

     3  week2      4

     3  week8      2

     ;

The new code need to imputed the only the last visits:

i am looking for this want dataset:

   1   baseline 10 

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week4    12

     2  week8     12

     2  week12     12

     3  baseline  10

     3  week2      4

     3  week8      2

     3  week12   2

Can you help with this last one request!

Thnaks Arthur.

art297
Opal | Level 21

Both your data and criteria keep changing.  Are you sure you know what you are trying to achieve?:

proc format;

         value $seq

            'baseline'=1

              'week2'=2

              'week4'=3

              'week8'=4

              'week12'=5

              ;

              value seq

              1='baseline'

              2='week2'

              3='week4'

              4='week8'

              5='week12'

              ;

run;

data new;

   length subjid $8 day $10 value 8;

   input subjid day value;

   datalines;

   1   baseline 10

   1   week2   12

   1   week4   14

   1   week8   16

   1   week12  12

   2   baseline 10

   2   week4    12

   3   baseline  10

   3   week2      4

   3   week8      2

   ;

data new1;

  set new;

  _day=put(day,$seq.);

run;

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

  if not last.subjid  then do _i=_day to __day-1;

    day=put(_i,seq.);

    if not missing(value) and (_i  eq _day or _i eq 4 or _i eq 5) then output;

    *if _day = 1 then call missing(value);

  end;

  else do _i=_day to 5;

    day=put(_i,seq.);

    if _i eq _day or _i eq 4 or _i eq 5 then output;

  end;

  drop _:;

run;

michtka
Fluorite | Level 6

Hi Arthur, your last post did not come out in my email Smiley Happy

Yes, this is the last request. Thank you.

I will try later if it works.

Thnak you again for your time.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 2614 views
  • 7 likes
  • 5 in conversation