10-11-2017 11:59 AM
Have been unsuccessfully using lag/proc expand/proc sql to accomplish a task and would like your help.
I have a dataset where i have missing values. I would like to create a new variable mean_nonmiss where the value is assigned by mean if non missing, and if it was missing, then the previous nonmissing value will be assigned, within each date, zone group.
I am able to fill in missing values where the previous value is nonmissing (proc sql, used "prev." to obtain previous values). But does not work if there are multiple consecutive values. Any help is appreciated. Thank you!
data have; input date zone mean; datalines; jan1 1 10 jan2 1 12 jan3 1 . jan4 1 8 jan5 1 10 jan1 2 . jan2 2 . jan3 2 9 jan4 2 10 ; run; data want; input date zone mean_nonmiss; datalines; jan1 1 10 jan2 1 12 jan3 1 12 jan4 1 8 jan5 1 10 jan1 2 10 jan2 2 10 jan3 2 9 jan4 2 10 ; run;
10-11-2017 12:04 PM
This sounds like a "hot deck" imputation, which PROC SURVEYIMPUTE will do for you. If so, no need for macros.
10-11-2017 12:14 PM
Here is an example to give you the 'want', I am however confused by your description that 'within same date, zone', as your 'want' does not show it.
data want; set have; retain mean_nonmiss; mean_nonmiss=coalesce(mean,mean_nonmiss); drop mean; run;
10-11-2017 12:42 PM
What I mean by within date, zone is that the previous value should be assigned only within a same date, zone group. The data is sorted by zone, then date. so, every zone has jan1-dec31 values. if there is a missing jan 1 value, it should not pull from the previous zone's dec 31.
Ended up getting it to work with the following:
data want; set have; by zone date; retain mean_nomiss; if first.zone and first.date then mean_nomiss=.; if mean ne . then mean_nomiss=mean; else if mean=. then mean= temp_nomiss; run;
10-11-2017 12:39 PM
The results you illustrate are consistent with this program:
if mean = . then mean = mean_nonmiss;
else mean_nonmiss = mean;
But it's not 100% clear that you intended to allow values to propagate across different ZONEs. They do in your example, but should they? It's an easy tweak to the program either way.
10-11-2017 12:43 PM
whoops, sorry you are correct, i did not make that clear. How would you adjust the program to adjust it? Ultimately, if there is a missing value for Jan 1, I would like to assign it the next value instead of the previous. Am thinking of using some if, then, else statements.
10-11-2017 01:24 PM
Since it's not clear how far ahead you have to search to find a nonmissing value (see ZONE=2, for example, where both JAN1 and JAN2 have missing values), here's what I would propose. This assumes your data set is in order by ZONE DATE:
where mean > .;
keep zone mean;
merge have first_nonmiss (rename=(mean=mean_nonmiss));
if mean=. then mean=mean_nonmiss;