## Impute value for missing data

Solved
Occasional Contributor
Posts: 8

# Impute value for missing data

Hi everyone,

I have a sample dataset as follows

id   gender  year   weight

1    F        2009    50.5

1    F        2010       .

1    F        2011       .

1    F        2012    55.0

2    F        2009       .

2    F        2010    48.6

2    F        2011    51.2

2    F        2012       .

3    M       2009       .

3    M       2010       .

3    M       2011       .

3    M       2012    89.0

4    F        2009    51.9

4    F        2010    52.0

4    F        2011    58.1

4    F        2012       .

:

I am trying to impute the values for missing weight values which is based on the previous year.

id   gender  year   weight

1    F        2009    50.5

1    F        2010    50.5

1    F        2011    50.5

1    F        2012    55.0

2    F        2009      .

2    F        2010    48.6

2    F        2011    51.2

2    F        2012    51.2

3    M       2009       .

3    M       2010      .

3    M       2011       .

3    M       2012    89.0

4    F        2009    51.9

4    F        2010    52.0

4    F        2011    58.1

4    F        2012    58.1

data locf;

set weight;

retain weight;

by id year;

lag_weight=lag(weight);

if lag_weight~=. then weight=;

end;

run;

But definitely that is not correct command. Could anyone give me some ideas?

Accepted Solutions
Solution
‎06-13-2013 08:03 AM
Super Contributor
Posts: 578

## Re: Impute value for missing data

I think all you need to do is change to this:

data locf;

set weight;

by id;

lag_weight=lag(weight);

if not first.id then do;

if weight = . then weight=lag_weight;

end;

run;

All Replies
Super User
Posts: 5,876

## Re: Impute value for missing data

You should chose to use retain or lag, they solve similar problem, but in different ways.

If you use retain, retain a separate "helper" variable instead of an existing variable, that will give you more control.

Data never sleeps
Posts: 3,852

## Re: Impute value for missing data

Lag is not the right tool for LOCF.  You can use the UPDATE trick.

data weight;
input (id gender)(\$)  year weight;
cards;
1    F        2009    50.5
1    F        2010       .
1    F        2011       .
1    F        2012    55.0
2    F        2009       .
2    F        2010    48.6
2    F        2011    51.2
2    F        2012       .
3    M        2009       .
3    M        2010       .
3    M        2011       .
3    M        2012    89.0
4    F        2009    51.9
4    F        2010    52.0
4    F        2011    58.1
4    F        2012       .
;;;;
run;
data weight1;
update weight(obs=0) weight;
by id;
output;

run;
This will LOCF all the variables. If you need to restrict the LOCFing you can use UPDATE and SET like this...

data weight2;
if 0 then set weight;
update weight(obs=0 keep=id weight) weight(keep=id weight);
by id;
set weight(keep=gender year);
output;

run;
Occasional Contributor
Posts: 18

## Re: Impute value for missing data

Hi Jesssun,

3    M        2009       .

3    M        2010       .

3    M        2011       .

3    M        2012    89.0

please correct me if I have not understood it right.

I above situation you want all to be missing as 2009 is missing.

Occasional Contributor
Posts: 8

## Re: Impute value for missing data

Hi Gaurang,

Yes, you are right as there is no previous value as reference. So I decide to keep those as missing value.

Jess

Occasional Contributor
Posts: 18

## Re: Impute value for missing data

Hi Jesssun,

do the following steps. I request you to give your feedback.

proc sort data = weight;

by id gender year;

quit;

data weight_1;

set weight;

by id;

if first.id then new_weight = weight;

weight=new_weight;

run;

Occasional Contributor
Posts: 8

## Re: Impute value for missing data

Hi Gaurang,

I am afraid that is not the right one.

Because there are only two obs in the weight1 and the rest of the obs has gone.

Do you know anything about "lag"?

I prefer to use 'lag'. I think it would be much easier.

Jess

Occasional Contributor
Posts: 18

## Re: Impute value for missing data

The solution that I have suggested is presuming that dataset 'weight' does exist.

there is one change in the code that I sent you.

data weight_1 (drop = new_weight);

set weight;

by id;

if first.id then new_weight = weight;

weight=new_weight;

output;

run;

Occasional Contributor
Posts: 18

## Re: Impute value for missing data

Hi JessSun,

One more change.  Sorry for missing one statement.

!00% sure this will work and will meet your requirement.

data weight_1 (drop = new_weight);

set weight;

by id;

retain new_weight;

if first.id then new_weight = weight;

weight=new_weight;

run;

Solution
‎06-13-2013 08:03 AM
Super Contributor
Posts: 578

## Re: Impute value for missing data

I think all you need to do is change to this:

data locf;

set weight;

by id;

lag_weight=lag(weight);

if not first.id then do;

if weight = . then weight=lag_weight;

end;

run;

Occasional Contributor
Posts: 8

## Re: Impute value for missing data

Thank you so much DBailey:smileylaugh::smileyplus:

Posts: 3,167

## Re: Impute value for missing data

Not so fast. 's solution does not provide the exact what you asked for, check id = 1, the retained the value only good for next one obs. The following seems to do what you want:

data want;

set weight;

by id ;

retain _r;

if first.id then call missing(_r);

weight=coalesce(weight,_r);

_r=coalesce(weight,_r);

run;

Haikuo

Super Contributor
Posts: 578

## Re: Impute value for missing data

true...might have to use the retain function when there are multiple missing values.

Occasional Contributor
Posts: 18

## Re: Impute value for missing data

Hi JessSun,

As the sample data that you have said is in proper sorted format the following code 1 will work. But if the data is not in proper sorted format then code 2 will work.

1. data locf;

set weight;

by id;

lag_weight=lag(weight);

if not first.id then do;

if weight = . then weight=lag_weight;

end;

run;

2.

proc sort data = weight;

by id gender year;

quit;

data weight_1 (drop = new_weight);

set weight;

by id;

retain new_weight;

if first.id then new_weight = weight;

weight=new_weight;

run;

🔒 This topic is solved and locked.