Help using Base SAS procedures

missing values imputed with 0 values rather than LOCF

Reply
Super Contributor
Posts: 301

missing values imputed with 0 values rather than LOCF

Hi everyone, the code below has been developped by Haikuo, it imputed the day missing with the LOCF value, except the baseline data, it wont be carried forward.

But, now I am interested to give the value 0 to these imputed days...please see the want dataset I am looking for:

proc format;

         value $seq

            'baseline'=1

              'week2'=2

              'week4'=3

              'week8'=4

              'week12'=5

              ;

              value seq

              1='baseline'

              2='week2'

              3='week4'

              4='week8'

              5='week12'

              ;

      run;

data new;
   length subjid $8 day $10 value 8;
   input subjid day value;
   datalines;
1  baseline 10
1  week2    12
1  week4    14
1  week8    16
1  week12   12
2  baseline 10
2  week2    12
2  week4    10
3  baseline 10
3  week2     3
3  week8     4
4  baseline 10
4  week8     4
;;;;
   run;

data new1;

        set new;

        _day=put(day,$seq.);

        run;

data want;

  set new1;

  by subjid;

  set new1(firstobs=2 keep=_day rename=_day=__day) new1(obs=1 drop=_all_);

     if not last.subjid  then do _i=_day to __day-1;

          day=put(_i,seq.);

/* another line changed */

          if not missing(value) then output;

          if _day eq 1 then call missing(value);
  

       end;

  else do _i=_day to 5;

          day=put(_i,seq.);

          output;

       end;

       /*drop _:;*/

run;

  Result from the Hikuo code, the blodface records are imputed with the LOCF:

Obs subjid day value

1 1 baseline 10

2 1 week2 12

3 1 week4 14

4 1 week8 16

5 1 week12 12

6 2 baseline 10

7 2 week2 12

8 2 week4 10

9 2 week8 10

10 2 week12 10

11 3 baseline 10

12 3 week2 3

13 3 week4 3

14 3 week8 4

15 3 week12 4

16 4 baseline 10

17 4 week8 4

18 4 week12 4

Dataset Want I am looking for, (I want to generate this dataset using the above code)...I think I need to make little changes to get the 0 vales rather the LOCF values.

1  baseline 10

1  week2    12

1  week4    14

1  week8    16

1  week12   12

2  baseline 10

2  week2    12

2  week4    10

2 week8      0

2 week12    0

3  baseline 10

3  week2     3

3 week4      0

3  week8     4

3 week12    0

4  baseline 10

4  week8     4

4 week12    0

Thanks, V.

Respected Advisor
Posts: 3,799

Re: missing values imputed with 0 values rather than LOCF

An easy way to get numeric missing changed to 0 or any other number is to use PROC STDIZE.  Of course you also need a way to EXPAND the data if the observations are also missing.

data new;
   length subjid $8 day 8 value 8;
  
input subjid day value @@;
   datalines;
1  0 10  1  2 12  1  4 14  1  8 16  1  12 12
2  0 10  2  2 12  2  4 10 
3  0 10  3  2  3  3  8  4 
4  0 10  4  8  4
;;;;
   run;
data classdata;
   do day=4,8,12;
    
output;
    
end;
  
run;
proc summary nway data=new classdata=classdata order=internal;
   by subjid;
   class day;
   output out=new2(drop=_freq_ _type_) idgroup(out(value)=);
   run;
proc stdize method=sum missing=0 reponly;
  
var value;
   run;
Super Contributor
Posts: 644

Re: missing values imputed with 0 values rather than LOCF

As you no longer require duplication of values to replace missing values the process can be simplified.

Key features:
     create an invalue seq for an input function rather than using put with automatic type conversion
     Create a restricted Cartesian product of subjid and day (normally a bad idea!) to form a template of all  possible subjid and day combinations
     Use a left join to update the template with value from new
     Use the sum function, with zero, to convert missing values to zero

NOTE:  This code generates values for subjid = 4 and day = week2 and week4, which are not in your want table.  If you use this code you may need a datastep to delete these values based on whether the last undeleted row was a baseline, if I understand your data correctly.  I can add that if you require.

proc format;
         invalue seq
            'baseline'=1
              'week2'=2
              'week4'=3
              'week8'=4
              'week12'=5
              ;
quit ;

data new;
   length subjid $8 day $10 value 8;
   input subjid day value;
   datalines;
1  baseline 10
1  week2    12
1  week4    14
1  week8    16
1  week12   12
2  baseline 10
2  week2    12
2  week4    10
3  baseline 10
3  week2     3
3  week8     4
4  baseline 10
4  week8     4
;

Proc SQL ;

Create Table template As
  Select distinct
    a.subjid
   , b.day
   ,  input (b.day, seq.) as seq
  From new a
   , new b
  Order
   By a.subjid
   , seq
     ;
    
  Create Table  want  (drop = seq) As
   Select
    tmp.*
    , Sum (new.value, 0) As value
   From template tmp
   Left Join
     new   new
    On new.subjid = tmp.subjid
    And new.day  = tmp.day
   Order
    By tmp.subjid
    , tmp.seq
     ;
    
   
  Select *
   from want ;
  
Quit ;


subjid    day         value
------      ---           -----
1          baseline  10
1          week2     12
1          week4     14
1          week8     16
1          week12   12
2          baseline  10
2          week2     12
2          week4     10
2         week8     0
2          week12   0
3          baseline  10
3          week2     3
3          week4     0
3          week8     4
3          week12   0
4          baseline  10
4          week2     0
4         week4     0
4          week8     4
4          week12   0

Regards

Richard in Oz

Super Contributor
Posts: 301

Re: missing values imputed with 0 values rather than LOCF

Posted in reply to RichardinOz


Thank RichardinOz for your great code, will be great, because the baseline data will not be carried forward, it means week2 and week8 , records after baseline if there are missing, they will reminin missing.

Super Contributor
Posts: 301

Re: missing values imputed with 0 values rather than LOCF

week4 , sorry


Respected Advisor
Posts: 3,156

Re: missing values imputed with 0 values rather than LOCF

Hi,

If you read into Ksharp's code, you will find it only needs a minor tweak to meet your new requirement.

data new;

  length subjid $8 day $10 value 8;

  input subjid day value;

  datalines;

1 baseline 10

1 week2 12

1 week4 14

1 week8 16

1 week12 12

2 baseline 10

2 week4 10

2 week8 10

3 baseline 10

3 week2 3

3 week8 4

4 baseline 10

4 week8 4

;;;;

  run;

data want;

merge new new(firstobs=2 keep=day rename=( day=_day));

array a{4} $ _temporary_ ('week2' 'week4' 'week8' 'week12');

output;

if day =: 'week' then do;

  end=whichc(_day,of a{*});

  do i=whichc(day,of a{*})+1 to ifn(end=0,dim(a),end-1);

  day=a{i}; value=0;output;

  end;

end;

drop _day end i;

run;

proc print;run;

The underlined statement is the only one needed to be added.

Haikuo

Ask a Question
Discussion stats
  • 5 replies
  • 364 views
  • 0 likes
  • 4 in conversation