I have a dataset with 37K student monthly obs over 10 months. However, not all the obs were taken for each student so I have random missing values scattered throughout my data. I've been trying to use the proc expand procedure to fill the missing valeus with the previous observed value without success. SAS does not say that my code is incorrect but the putput dataset still has the missing values. I've tried it a few ways below and it hasn't worked.
proc expand data=expand out=expand1 method=step;
convert income max_credit_loss_prob/ observed=beginning;
id trans_ym;
by student;
run;
proc expand data=expand out=expand1;
convert income max_credit_loss_prob/ method=step;
id trans_ym;
by student;
run;
What am I doing wrong?
John
John, Can you provide some sample data and the file you want to achieve based on that data?
I'm not sure, from your problem statement, if you need to create missing records or only need to fill in missing values.
I already have a balanced panel with "." in all columns for missing values. I am trying to convert all the observations currently filled with "." to the previous non missing observation. I hope this helps clarify my question.
Example sample data:
Student_ID Month Monthly_Income
1 1 100
1 2 .
1 3 110
1 4 110
1 5 110
1 6 .
1 7 110
1 8 120
1 9 120
1 10 120
What I am trying to get to:
Student_ID Month Monthly_Income
1 1 100
1 2 100
1 3 110
1 4 110
1 5 110
1 6 110
1 7 110
1 8 120
1 9 120
1 10 120
I feel like it should be so simple using proc expand and yet I jsut can't get it to work. Thanks in advance for your help!
John
Wouldn't it be even easier, then, just doing it in a datastep. Unless I misunderstand what you are trying to do, doesn't the following accomplish the task?:
data want (drop=Monthly_Income
rename=(Locf=Monthly_Income));
do until (last.Student_ID);
set have;
by Student_ID;
Locf = coalesce(Monthly_Income,locf);
output;
end;
run;
It looks like it works. I didn't know the locf syntax so that's something very handy to know. The reason(besides the fact I didn't know locf) I was using PROC EXPAND was the ability to specify multiple variables to do it over using the "convert" statement. In your code it looks like I would just use multiple coalesce statements.
The above sample data was missing several variables that are in the real datastep that I also need to do this for. Thank you so much. One last question about your code. Since it doesn't work on the missing value if it is the first in the series ie first.monthly_income what should I do?
What value would you want to assign in such a case? The first non-missing value for that student?
Yes please!
One way would be to wrap the same method within a double dow loop. E.g.,:
data want (drop=Monthly_Income
rename=(Locf=Monthly_Income));
do until(last.Student_ID);
set have;
by Student_ID;
if missing(Locf) and not missing(Monthly_Income)
then Locf = Monthly_Income;
end;
do until(last.Student_ID);
set have;
by Student_ID;
Locf = coalesce(Monthly_Income,locf);
output;
end;
run;
It worked! Thank you so much!
Hi. I had the same missing values problem and used this code and it worked great for me as well. I also tried it on missing character data though, letters and entire words, and I couldn't get it to work. Do you know how I could change this code to work for character data?
Thanks so much!
Maybe because you haven't switched to the right function for character?
Locf = coalesceC(Monthly_Income,locf);
Haikuo
Thanks for the suggestions, but I can't get either one to work. I'm sure it's something I am doing wrong. When I try coalesceC, I get errors saying I have invalid numeric data even though I know it's all character data.
When I try the update method, which would be great if I could get it to work because I have several character variables with missing data, it's not filling in the missing data. The first section of that code is just creating the dataset student right? My dataset is called want that contains the missing data and my identifier is cnum, so I just did
data wantLOCF;
update want(obs=0) want;
by cnum;
output;
run;
Is this right? Like I said, I know it's me doing something incorrectly so I really appreciate your help!
Show example of your data.
It works because UPDATE does not update/replace a variable value with missing. As long as the missing value is not ._ and there is the NOMISSINGCHECK option but that not what we want here.
We need OUTPUT because the transaction data is actually the data we want not just the LAST obs from the BY list as UPDATE normally outputs.
Can you run data step debugger?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.