Solved: Fill missing values with the previous or nexy values by id variable

ML86 · Posted 06-02-2022 09:38 AM

Dear community,

I have a problem and I hope you can help me. Image I have the following table

id Var1 Var2 Var3 Var4 Var5 Var6

1 10 . . 0.25 . .

1 . 11 . . 0.3 .

1 . . 41 . . 0.03

2 46 . . 0.02 . .

3 28 . . 0.03 . .

3 . 29 . . 0.02

I want to replace the missing values (.) with the previous or next values by considering the id-variable. That means I want to have the following table

id Var1 Var2 Var3 Var4 Var5 Var6

1 10 11 41 0.25 0.3 0.03

2 46 11 41 0.02 . .

3 28 . 29 0.03 . 0.02

How I can I do this?

Thank you very much and kind regards

novinosrin · Posted 06-02-2022 10:13 AM

or use an UPDATE by ID to fill, and loop the filled for the count of occurrences of ID-


data have;
input id   Var1   Var2   Var3   Var4   Var5   Var6;
cards;
1   10       .         .           0.25   .         .     

1   10      .          .           0.25  .         .

1   .       11         .           .        0.3     .

1   .       11        .           .         0.3    .

1   .      .           41        .        .        0.03

1   .     .           41        .        .        0.03

1  .      .          41        .        .         0.03

2   46  .         .          0.02     .       .

3   28 .        .           0.03     .        .

3   .     .         29        .          .        0.02
;

data want ;
  do _n_ = 1 by 1 until (last.id ) ;
    update have(obs=0) have ;
    by id ;
  end ;
  do _n_ = 1 to _n_ ;
    output ;
  end ;
run ;

View solution in original post

PaigeMiller · Posted 06-02-2022 09:56 AM

From the data you show, a solution which uses the mean value in place of the missings ought to work, and is very simple to do using PROC STDIZE. Would that work for you?

--
Paige Miller

novinosrin · Posted 06-02-2022 10:13 AM

or use an UPDATE by ID to fill, and loop the filled for the count of occurrences of ID-


data have;
input id   Var1   Var2   Var3   Var4   Var5   Var6;
cards;
1   10       .         .           0.25   .         .     

1   10      .          .           0.25  .         .

1   .       11         .           .        0.3     .

1   .       11        .           .         0.3    .

1   .      .           41        .        .        0.03

1   .     .           41        .        .        0.03

1  .      .          41        .        .         0.03

2   46  .         .          0.02     .       .

3   28 .        .           0.03     .        .

3   .     .         29        .          .        0.02
;

data want ;
  do _n_ = 1 by 1 until (last.id ) ;
    update have(obs=0) have ;
    by id ;
  end ;
  do _n_ = 1 to _n_ ;
    output ;
  end ;
run ;

mkeintz · Posted 06-02-2022 11:10 AM

@novinosrin

I like the concept of using UPDATE, followed by a do loop of SET/OUTPUT statements. But it can be cleaned a bit:

data want ;
  update have(obs=0) have ;
  by id ;

  if last.id then do until (last.id);
    set have (keep=id);
    by id;
    output;
  end ;
run ;

I should add that this provides a way to specify a variable list in the "keep=" parameter that allows original values to prevail.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Tom · Posted 06-02-2022 10:16 AM

You example seems to result in all observations in a group being exactly the same.

What do you want to happen when the non-missing values are not all the same?

Is there another variable that can be used to insure that the observations are in the right order? A sequence number or date perhaps?

Ksharp · Posted 06-02-2022 11:11 PM

If your these variables have the same value within the same group ID .

proc sql;
create table want(drop=dummy) as
select id,var1 as dummy,
max(var1) as var1 ,
max(var2) as var2 ,
max(var3) as var3 ,
max(var4) as var4 ,
max(var5) as var5 
 from have 
  group by id;
quit;

ML86 · Posted 06-03-2022 01:55 AM

Thank you very much for the suggested solutions.

Fill missing values with the previous or nexy values by id variable

Re: Fill missing values with the previous or nexy values by id variable

Re: Fill missing values with the previous or nexy values by id variable

Re: Fill missing values with the previous or nexy values by id variable

Re: Fill missing values with the previous or nexy values by id variable

Re: Fill missing values with the previous or nexy values by id variable

Re: Fill missing values with the previous or nexy values by id variable

Re: Fill missing values with the previous or nexy values by id variable

Registration is open

Registration is open

SAS Training: Just a Click Away