Help using Base SAS procedures

Fill missing values with proc expand

Reply
Occasional Contributor
Posts: 14

Fill missing values with proc expand

I have a dataset with 37K student monthly obs over 10 months.  However, not all the obs were taken for each student so I have random missing values scattered throughout my data.  I've been trying to use the proc expand procedure to fill the missing valeus with the previous observed value without success.  SAS does not say that my code is incorrect but the putput dataset still has the missing values.  I've tried it a few ways below and it hasn't worked.

proc expand data=expand out=expand1 method=step;

convert income max_credit_loss_prob/ observed=beginning;

id trans_ym;

by student;

run;

proc expand data=expand out=expand1;

convert income max_credit_loss_prob/ method=step;

id trans_ym;

by student;

run;

What am I doing wrong?

John

PROC Star
Posts: 7,363

Fill missing values with proc expand

John, Can you provide some sample data and the file you want to achieve based on that data?

I'm not sure, from your problem statement, if you need to create missing records or only need to fill in missing values.

Occasional Contributor
Posts: 14

Fill missing values with proc expand

I already have a balanced panel with "." in all columns for missing values.  I am trying to convert all the  observations currently filled with "." to the previous non missing observation.  I hope this helps clarify my question.

Example sample data:

Student_ID    Month    Monthly_Income

1                    1               100

1                    2               .

1                    3               110

1                    4               110   

1                    5               110    

1                    6               .

1                    7               110

1                    8               120

1                    9               120   

1                    10             120

What I am trying to get to:

Student_ID    Month    Monthly_Income

1                    1               100

1                    2               100

1                    3               110

1                    4               110   

1                    5               110    

1                    6               110

1                    7               110

1                    8               120

1                    9               120   

1                    10             120

I feel like it should be so simple using proc expand and yet I jsut can't get it to work.  Thanks in advance for your help!

John

PROC Star
Posts: 7,363

Fill missing values with proc expand

Wouldn't it be even easier, then, just doing it in a datastep.  Unless I misunderstand what you are trying to do, doesn't the following accomplish the task?:

data want (drop=Monthly_Income

           rename=(Locf=Monthly_Income));

  do until (last.Student_ID);

    set have;

    by Student_ID;

    Locf = coalesce(Monthly_Income,locf);

    output;

  end;

run;

Occasional Contributor
Posts: 14

Fill missing values with proc expand

It looks like it works.  I didn't know the locf syntax so that's something very handy to know.  The reason(besides the fact I didn't know locf) I was using PROC EXPAND was the ability to specify multiple variables to do it over using the "convert" statement.  In your code it looks like I would just use multiple coalesce statements.

The above sample data was missing several variables that are in the real datastep that I also need to do this for. Thank you so much.  One last question about your code.  Since it doesn't work on the missing value if it is the first in the series ie first.monthly_income what should I do?

PROC Star
Posts: 7,363

Fill missing values with proc expand

What value would you want to assign in such a case?  The first non-missing value for that student?

Occasional Contributor
Posts: 14

Fill missing values with proc expand

Yes please!

PROC Star
Posts: 7,363

Fill missing values with proc expand

One way would be to wrap the same method within a double dow loop.  E.g.,:

data want (drop=Monthly_Income

           rename=(Locf=Monthly_Income));

  do until(last.Student_ID);

    set have;

    by Student_ID;

    if missing(Locf) and not missing(Monthly_Income)

      then Locf = Monthly_Income;

  end;

  do until(last.Student_ID);

    set have;

    by Student_ID;

    Locf = coalesce(Monthly_Income,locf);

    output;

  end;

run;

Occasional Contributor
Posts: 14

Fill missing values with proc expand

It worked! Thank you so much!

Contributor
Posts: 23

Re: Fill missing values with proc expand

Hi.  I had the same missing values problem and used this code and it worked great for me as well.  I also tried it on missing character data though, letters and entire words, and I couldn't get it to work.  Do you know how I could change this code to work for character data?

Thanks so much!

Respected Advisor
Posts: 3,124

Re: Fill missing values with proc expand

Maybe because you haven't switched to the right function for character?

Locf = coalesceC(Monthly_Income,locf);

Haikuo

Contributor
Posts: 23

Re: Fill missing values with proc expand

Thanks for the suggestions, but I can't get either one to work. I'm sure it's something I am doing wrong.  When I try coalesceC, I get errors saying I have invalid numeric data even though I know it's all character data. 

When I try the update method, which would be great if I could get it to work because I have several character variables with missing data, it's not filling in the missing data.  The first section of that code is just creating the dataset student right?  My dataset is called want that contains the missing data and my identifier is cnum, so I just did

data wantLOCF;

     update want(obs=0) want;

     by cnum;

     output;

run;

Is this right? Like I said, I know it's me doing something incorrectly so I really appreciate your help!

Respected Advisor
Posts: 3,777

Re: Fill missing values with proc expand

Show example of your data.

PROC Star
Posts: 7,363

Re: Fill missing values with proc expand

DN, I find your approach to LOCF fascinating, but don't understand how it works.  Do you have time to provide a brief description of the processing?

TIA,

Art

Respected Advisor
Posts: 3,777

Re: Fill missing values with proc expand

It works because UPDATE does not update/replace a variable value with missing.  As long as the missing value is not ._ and there is the NOMISSINGCHECK option but that not what we want here.

We need OUTPUT because the transaction data is actually the data we want not just the LAST obs from the BY list as UPDATE normally outputs.

Can you run data step debugger?

Ask a Question
Discussion stats
  • 25 replies
  • 6227 views
  • 4 likes
  • 6 in conversation