BookmarkSubscribeRSS Feed
warnerj
Calcite | Level 5

I have a dataset with 37K student monthly obs over 10 months.  However, not all the obs were taken for each student so I have random missing values scattered throughout my data.  I've been trying to use the proc expand procedure to fill the missing valeus with the previous observed value without success.  SAS does not say that my code is incorrect but the putput dataset still has the missing values.  I've tried it a few ways below and it hasn't worked.

proc expand data=expand out=expand1 method=step;

convert income max_credit_loss_prob/ observed=beginning;

id trans_ym;

by student;

run;

proc expand data=expand out=expand1;

convert income max_credit_loss_prob/ method=step;

id trans_ym;

by student;

run;

What am I doing wrong?

John

25 REPLIES 25
art297
Opal | Level 21

John, Can you provide some sample data and the file you want to achieve based on that data?

I'm not sure, from your problem statement, if you need to create missing records or only need to fill in missing values.

warnerj
Calcite | Level 5

I already have a balanced panel with "." in all columns for missing values.  I am trying to convert all the  observations currently filled with "." to the previous non missing observation.  I hope this helps clarify my question.

Example sample data:

Student_ID    Month    Monthly_Income

1                    1               100

1                    2               .

1                    3               110

1                    4               110   

1                    5               110    

1                    6               .

1                    7               110

1                    8               120

1                    9               120   

1                    10             120

What I am trying to get to:

Student_ID    Month    Monthly_Income

1                    1               100

1                    2               100

1                    3               110

1                    4               110   

1                    5               110    

1                    6               110

1                    7               110

1                    8               120

1                    9               120   

1                    10             120

I feel like it should be so simple using proc expand and yet I jsut can't get it to work.  Thanks in advance for your help!

John

art297
Opal | Level 21

Wouldn't it be even easier, then, just doing it in a datastep.  Unless I misunderstand what you are trying to do, doesn't the following accomplish the task?:

data want (drop=Monthly_Income

           rename=(Locf=Monthly_Income));

  do until (last.Student_ID);

    set have;

    by Student_ID;

    Locf = coalesce(Monthly_Income,locf);

    output;

  end;

run;

warnerj
Calcite | Level 5

It looks like it works.  I didn't know the locf syntax so that's something very handy to know.  The reason(besides the fact I didn't know locf) I was using PROC EXPAND was the ability to specify multiple variables to do it over using the "convert" statement.  In your code it looks like I would just use multiple coalesce statements.

The above sample data was missing several variables that are in the real datastep that I also need to do this for. Thank you so much.  One last question about your code.  Since it doesn't work on the missing value if it is the first in the series ie first.monthly_income what should I do?

art297
Opal | Level 21

What value would you want to assign in such a case?  The first non-missing value for that student?

warnerj
Calcite | Level 5

Yes please!

art297
Opal | Level 21

One way would be to wrap the same method within a double dow loop.  E.g.,:

data want (drop=Monthly_Income

           rename=(Locf=Monthly_Income));

  do until(last.Student_ID);

    set have;

    by Student_ID;

    if missing(Locf) and not missing(Monthly_Income)

      then Locf = Monthly_Income;

  end;

  do until(last.Student_ID);

    set have;

    by Student_ID;

    Locf = coalesce(Monthly_Income,locf);

    output;

  end;

run;

warnerj
Calcite | Level 5

It worked! Thank you so much!

trich12
Calcite | Level 5

Hi.  I had the same missing values problem and used this code and it worked great for me as well.  I also tried it on missing character data though, letters and entire words, and I couldn't get it to work.  Do you know how I could change this code to work for character data?

Thanks so much!

Haikuo
Onyx | Level 15

Maybe because you haven't switched to the right function for character?

Locf = coalesceC(Monthly_Income,locf);

Haikuo

trich12
Calcite | Level 5

Thanks for the suggestions, but I can't get either one to work. I'm sure it's something I am doing wrong.  When I try coalesceC, I get errors saying I have invalid numeric data even though I know it's all character data. 

When I try the update method, which would be great if I could get it to work because I have several character variables with missing data, it's not filling in the missing data.  The first section of that code is just creating the dataset student right?  My dataset is called want that contains the missing data and my identifier is cnum, so I just did

data wantLOCF;

     update want(obs=0) want;

     by cnum;

     output;

run;

Is this right? Like I said, I know it's me doing something incorrectly so I really appreciate your help!

data_null__
Jade | Level 19

Show example of your data.

art297
Opal | Level 21

DN, I find your approach to LOCF fascinating, but don't understand how it works.  Do you have time to provide a brief description of the processing?

TIA,

Art

data_null__
Jade | Level 19

It works because UPDATE does not update/replace a variable value with missing.  As long as the missing value is not ._ and there is the NOMISSINGCHECK option but that not what we want here.

We need OUTPUT because the transaction data is actually the data we want not just the LAST obs from the BY list as UPDATE normally outputs.

Can you run data step debugger?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 11672 views
  • 4 likes
  • 6 in conversation