DATA Step, Macro, Functions and more

Create variable that assigns previous non missing value as value

Reply
Contributor
Posts: 39

Create variable that assigns previous non missing value as value

Hi all,


Have been unsuccessfully using lag/proc expand/proc sql to accomplish a task and would like your help. 

 

I have a dataset where i have missing values. I would like to create a new variable mean_nonmiss where the value is assigned by mean if non missing, and if it was missing, then the previous nonmissing value will be assigned, within each date, zone group. 

 

I am able to fill in missing values where the previous value is nonmissing (proc sql, used "prev." to obtain previous values). But does not work if there are multiple consecutive values. Any help is appreciated. Thank you!

 

 

data have; 
input date zone mean; 
datalines; 
jan1 1 10 
jan2 1 12
jan3 1 .
jan4 1 8
jan5 1 10
jan1 2 .
jan2 2 .
jan3 2 9 
jan4 2 10
;
run;  

data want; 
input date zone mean_nonmiss; 
datalines; 
jan1 1 10 
jan2 1 12
jan3 1 12
jan4 1 8
jan5 1 10
jan1 2 10
jan2 2 10
jan3 2 9 
jan4 2 10
;
run;  

 

 

 

 

Respected Advisor
Posts: 2,825

Re: Create variable that assigns previous non missing value as value

Posted in reply to pamplemouse22

This sounds like a "hot deck" imputation, which PROC SURVEYIMPUTE will do for you. If so, no need for macros.

--
Paige Miller
Respected Advisor
Posts: 3,167

Re: Create variable that assigns previous non missing value as value

Posted in reply to pamplemouse22

Here is an example to give you the 'want', I am however confused by your description that 'within same date, zone', as your 'want' does not show it.

 

data want;
set have;
retain mean_nonmiss;
mean_nonmiss=coalesce(mean,mean_nonmiss);
drop mean;
run;
Contributor
Posts: 39

Re: Create variable that assigns previous non missing value as value

Thank you!

What I mean by within date, zone is that the previous value should be assigned only within a same date, zone group. The data is sorted by zone, then date. so, every zone has jan1-dec31 values. if there is a missing jan 1 value, it should not pull from the previous zone's dec 31. 

 

Ended up getting it to work with the following: 

 

data want; set have;
	by zone date; 
	retain mean_nomiss; 
	if first.zone and first.date then  mean_nomiss=.;
	if mean ne . then  mean_nomiss=mean; 
	else if mean=. then mean= temp_nomiss; 
run; 
Super User
Posts: 6,632

Re: Create variable that assigns previous non missing value as value

Posted in reply to pamplemouse22

The results you illustrate are consistent with this program:

 

data want;

set have;

retain mean_nonmiss;

if mean = . then mean = mean_nonmiss;

else mean_nonmiss = mean;

run;

 

But it's not 100% clear that you intended to allow values to propagate across different ZONEs.  They do in your example, but should they?  It's an easy tweak to the program either way.

Contributor
Posts: 39

Re: Create variable that assigns previous non missing value as value

Posted in reply to Astounding

whoops, sorry you are correct, i did not make that clear. How would you adjust the program to adjust it? Ultimately, if there is a missing value for Jan 1, I would like to assign it the next value instead of the previous. Am thinking of using some if, then, else statements. 

Super User
Posts: 6,632

Re: Create variable that assigns previous non missing value as value

Posted in reply to pamplemouse22

Since it's not clear how far ahead you have to search to find a nonmissing value (see ZONE=2, for example, where both JAN1 and JAN2 have missing values), here's what I would propose.  This assumes your data set is in order by ZONE DATE:

 

data first_nonmiss;

set have;

by zone;

where mean > .;

if first.zone;

keep zone mean;

run;

 

data want;

merge have first_nonmiss (rename=(mean=mean_nonmiss));

by zone;

if mean=. then mean=mean_nonmiss;

else mean_nonmiss=mean;

run;

Ask a Question
Discussion stats
  • 6 replies
  • 112 views
  • 0 likes
  • 4 in conversation