Re: Fill missing values with proc expand

warnerj · Posted 10-27-2011 01:03 PM

I have a dataset with 37K student monthly obs over 10 months. However, not all the obs were taken for each student so I have random missing values scattered throughout my data. I've been trying to use the proc expand procedure to fill the missing valeus with the previous observed value without success. SAS does not say that my code is incorrect but the putput dataset still has the missing values. I've tried it a few ways below and it hasn't worked.

proc expand data=expand out=expand1 method=step;

convert income max_credit_loss_prob/ observed=beginning;

id trans_ym;

by student;

run;

proc expand data=expand out=expand1;

convert income max_credit_loss_prob/ method=step;

id trans_ym;

by student;

run;

What am I doing wrong?

John

art297 · Posted 10-27-2011 01:22 PM

John, Can you provide some sample data and the file you want to achieve based on that data?

I'm not sure, from your problem statement, if you need to create missing records or only need to fill in missing values.

warnerj · Posted 10-27-2011 01:30 PM

I already have a balanced panel with "." in all columns for missing values. I am trying to convert all the observations currently filled with "." to the previous non missing observation. I hope this helps clarify my question.

Example sample data:

Student_ID Month Monthly_Income

1 1 100

1 2 .

1 3 110

1 4 110

1 5 110

1 6 .

1 7 110

1 8 120

1 9 120

1 10 120

What I am trying to get to:

Student_ID Month Monthly_Income

1 1 100

1 2 100

1 3 110

1 4 110

1 5 110

1 6 110

1 7 110

1 8 120

1 9 120

1 10 120

I feel like it should be so simple using proc expand and yet I jsut can't get it to work. Thanks in advance for your help!

John

art297 · Posted 10-27-2011 01:43 PM

Wouldn't it be even easier, then, just doing it in a datastep. Unless I misunderstand what you are trying to do, doesn't the following accomplish the task?:

data want (drop=Monthly_Income

rename=(Locf=Monthly_Income));

do until (last.Student_ID);

set have;

by Student_ID;

Locf = coalesce(Monthly_Income,locf);

output;

end;

run;

warnerj · Posted 10-27-2011 01:59 PM

It looks like it works. I didn't know the locf syntax so that's something very handy to know. The reason(besides the fact I didn't know locf) I was using PROC EXPAND was the ability to specify multiple variables to do it over using the "convert" statement. In your code it looks like I would just use multiple coalesce statements.

The above sample data was missing several variables that are in the real datastep that I also need to do this for. Thank you so much. One last question about your code. Since it doesn't work on the missing value if it is the first in the series ie first.monthly_income what should I do?

art297 · Posted 10-27-2011 02:13 PM

What value would you want to assign in such a case? The first non-missing value for that student?

warnerj · Posted 10-27-2011 02:16 PM

Yes please!

art297 · Posted 10-27-2011 02:31 PM

One way would be to wrap the same method within a double dow loop. E.g.,:

data want (drop=Monthly_Income

rename=(Locf=Monthly_Income));

do until(last.Student_ID);

set have;

by Student_ID;

if missing(Locf) and not missing(Monthly_Income)

then Locf = Monthly_Income;

end;

do until(last.Student_ID);

set have;

by Student_ID;

Locf = coalesce(Monthly_Income,locf);

output;

end;

run;

warnerj · Posted 10-27-2011 03:55 PM

It worked! Thank you so much!

trich12 · Posted 10-11-2012 10:35 AM

Hi. I had the same missing values problem and used this code and it worked great for me as well. I also tried it on missing character data though, letters and entire words, and I couldn't get it to work. Do you know how I could change this code to work for character data?

Thanks so much!

Haikuo · Posted 10-11-2012 10:44 AM

Maybe because you haven't switched to the right function for character?

Locf = coalesceC(Monthly_Income,locf);

Haikuo

trich12 · Posted 10-11-2012 11:27 AM

Thanks for the suggestions, but I can't get either one to work. I'm sure it's something I am doing wrong. When I try coalesceC, I get errors saying I have invalid numeric data even though I know it's all character data.

When I try the update method, which would be great if I could get it to work because I have several character variables with missing data, it's not filling in the missing data. The first section of that code is just creating the dataset student right? My dataset is called want that contains the missing data and my identifier is cnum, so I just did

data wantLOCF;

update want(obs=0) want;

by cnum;

output;

run;

Is this right? Like I said, I know it's me doing something incorrectly so I really appreciate your help!

data_null__ · Posted 10-11-2012 11:48 AM

Show example of your data.

art297 · Posted 10-11-2012 12:15 PM

DN, I find your approach to LOCF fascinating, but don't understand how it works. Do you have time to provide a brief description of the processing?

TIA,

Art

data_null__ · Posted 10-11-2012 01:17 PM

It works because UPDATE does not update/replace a variable value with missing. As long as the missing value is not ._ and there is the NOMISSINGCHECK option but that not what we want here.

We need OUTPUT because the transaction data is actually the data we want not just the LAST obs from the BY list as UPDATE normally outputs.

Can you run data step debugger?

Registration is open

SAS Training: Just a Click Away