Impute value for missing data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Impute value for missing data

Hi everyone,

I have a sample dataset as follows

id   gender  year   weight

1    F        2009    50.5

1    F        2010       .

1    F        2011       .

1    F        2012    55.0

2    F        2009       .

2    F        2010    48.6

2    F        2011    51.2

2    F        2012       .

3    M       2009       .

3    M       2010       .

3    M       2011       .

3    M       2012    89.0

4    F        2009    51.9

4    F        2010    52.0

4    F        2011    58.1

4    F        2012       .

:

I am trying to impute the values for missing weight values which is based on the previous year.

id   gender  year   weight

1    F        2009    50.5

1    F        2010    50.5

1    F        2011    50.5

1    F        2012    55.0

2    F        2009      .

2    F        2010    48.6

2    F        2011    51.2

2    F        2012    51.2

3    M       2009       .

3    M       2010      .

3    M       2011       .

3    M       2012    89.0

4    F        2009    51.9

4    F        2010    52.0

4    F        2011    58.1

4    F        2012    58.1

data locf;

set weight;

retain weight;

by id year;

lag_weight=lag(weight);

if lag_weight~=. then weight=;

end;

run;

But definitely that is not correct command. Could anyone give me some ideas?

Thanks in advance! Smiley Happy


Accepted Solutions
Solution
‎06-13-2013 08:03 AM
Super Contributor
Posts: 578

Re: Impute value for missing data

I think all you need to do is change to this:

data locf;

set weight;

by id;

lag_weight=lag(weight);

if not first.id then do;

     if weight = . then weight=lag_weight;

end;

run;

View solution in original post


All Replies
Super User
Posts: 5,437

Re: Impute value for missing data

You should chose to use retain or lag, they solve similar problem, but in different ways.

If you use retain, retain a separate "helper" variable instead of an existing variable, that will give you more control.

Data never sleeps
Respected Advisor
Posts: 3,799

Re: Impute value for missing data

Lag is not the right tool for LOCF.  You can use the UPDATE trick.

data weight;
   input (id gender)($)  year weight;
   cards;
1    F        2009    50.5
1    F        2010       .
1    F        2011       .
1    F        2012    55.0
2    F        2009       .
2    F        2010    48.6
2    F        2011    51.2
2    F        2012       .
3    M        2009       .
3    M        2010       .
3    M        2011       .
3    M        2012    89.0
4    F        2009    51.9
4    F        2010    52.0
4    F        2011    58.1
4    F        2012       .
;;;;
   run;
data weight1;
   update weight(obs=0) weight;
   by id;
   output;
  
run;
This will LOCF all the variables. If you need to restrict the LOCFing you can use UPDATE and SET like this...

data weight2;
   if 0 then set weight;
   update weight(obs=0 keep=id weight) weight(keep=id weight);
   by id;
   set weight(keep=gender year);
   output;
  
run;
Occasional Contributor
Posts: 18

Re: Impute value for missing data

Hi Jesssun,

3    M        2009       .

3    M        2010       .

3    M        2011       .

3    M        2012    89.0

please correct me if I have not understood it right.

I above situation you want all to be missing as 2009 is missing.

Occasional Contributor
Posts: 8

Re: Impute value for missing data

Posted in reply to Gaurang_sas

Hi Gaurang,

Yes, you are right as there is no previous value as reference. So I decide to keep those as missing value.

Jess

Occasional Contributor
Posts: 18

Re: Impute value for missing data

Hi Jesssun,

do the following steps. I request you to give your feedback.

Thanks in advance.

proc sort data = weight;

by id gender year;

quit;

data weight_1;

set weight;

by id;

if first.id then new_weight = weight;

weight=new_weight;

run;


Occasional Contributor
Posts: 8

Re: Impute value for missing data

Posted in reply to Gaurang_sas

Hi Gaurang,

I am afraid that is not the right one.

Because there are only two obs in the weight1 and the rest of the obs has gone.

Do you know anything about "lag"?

I prefer to use 'lag'. I think it would be much easier.

Jess

Occasional Contributor
Posts: 18

Re: Impute value for missing data

The solution that I have suggested is presuming that dataset 'weight' does exist.

there is one change in the code that I sent you.

data weight_1 (drop = new_weight);

set weight;

by id;

if first.id then new_weight = weight;

weight=new_weight;

output;

run;

Occasional Contributor
Posts: 18

Re: Impute value for missing data

Hi JessSun,

One more change.  Sorry for missing one statement.

!00% sure this will work and will meet your requirement.

data weight_1 (drop = new_weight);

set weight;

by id;

retain new_weight;

if first.id then new_weight = weight;

weight=new_weight;

run;

Solution
‎06-13-2013 08:03 AM
Super Contributor
Posts: 578

Re: Impute value for missing data

I think all you need to do is change to this:

data locf;

set weight;

by id;

lag_weight=lag(weight);

if not first.id then do;

     if weight = . then weight=lag_weight;

end;

run;

Occasional Contributor
Posts: 8

Re: Impute value for missing data

Thank you so much DBailey:smileylaugh::smileyplus:

Respected Advisor
Posts: 3,156

Re: Impute value for missing data

Not so fast. 's solution does not provide the exact what you asked for, check id = 1, the retained the value only good for next one obs. The following seems to do what you want:

   data want;

     set weight;

    by id ;

    retain _r;

     if first.id then call missing(_r);

  weight=coalesce(weight,_r);

    _r=coalesce(weight,_r);

  run;

Haikuo

Super Contributor
Posts: 578

Re: Impute value for missing data

true...might have to use the retain function when there are multiple missing values.

Occasional Contributor
Posts: 18

Re: Impute value for missing data

Hi JessSun,

As the sample data that you have said is in proper sorted format the following code 1 will work. But if the data is not in proper sorted format then code 2 will work.

  1. data locf;

     set weight;

     by id;

     lag_weight=lag(weight);

     if not first.id then do;

          if weight = . then weight=lag_weight;

     end;

     run;

2.

proc sort data = weight;

by id gender year;

quit;

data weight_1 (drop = new_weight);

set weight;

by id;

retain new_weight;

if first.id then new_weight = weight;

weight=new_weight;

run;


🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 404 views
  • 3 likes
  • 6 in conversation