BookmarkSubscribeRSS Feed
pamplemouse22
Calcite | Level 5

Hi all,


Have been unsuccessfully using lag/proc expand/proc sql to accomplish a task and would like your help. 

 

I have a dataset where i have missing values. I would like to create a new variable mean_nonmiss where the value is assigned by mean if non missing, and if it was missing, then the previous nonmissing value will be assigned, within each date, zone group. 

 

I am able to fill in missing values where the previous value is nonmissing (proc sql, used "prev." to obtain previous values). But does not work if there are multiple consecutive values. Any help is appreciated. Thank you!

 

 

data have; 
input date zone mean; 
datalines; 
jan1 1 10 
jan2 1 12
jan3 1 .
jan4 1 8
jan5 1 10
jan1 2 .
jan2 2 .
jan3 2 9 
jan4 2 10
;
run;  

data want; 
input date zone mean_nonmiss; 
datalines; 
jan1 1 10 
jan2 1 12
jan3 1 12
jan4 1 8
jan5 1 10
jan1 2 10
jan2 2 10
jan3 2 9 
jan4 2 10
;
run;  

 

 

 

 

6 REPLIES 6
PaigeMiller
Diamond | Level 26

This sounds like a "hot deck" imputation, which PROC SURVEYIMPUTE will do for you. If so, no need for macros.

--
Paige Miller
Haikuo
Onyx | Level 15

Here is an example to give you the 'want', I am however confused by your description that 'within same date, zone', as your 'want' does not show it.

 

data want;
set have;
retain mean_nonmiss;
mean_nonmiss=coalesce(mean,mean_nonmiss);
drop mean;
run;
pamplemouse22
Calcite | Level 5

Thank you!

What I mean by within date, zone is that the previous value should be assigned only within a same date, zone group. The data is sorted by zone, then date. so, every zone has jan1-dec31 values. if there is a missing jan 1 value, it should not pull from the previous zone's dec 31. 

 

Ended up getting it to work with the following: 

 

data want; set have;
	by zone date; 
	retain mean_nomiss; 
	if first.zone and first.date then  mean_nomiss=.;
	if mean ne . then  mean_nomiss=mean; 
	else if mean=. then mean= temp_nomiss; 
run; 
Astounding
PROC Star

The results you illustrate are consistent with this program:

 

data want;

set have;

retain mean_nonmiss;

if mean = . then mean = mean_nonmiss;

else mean_nonmiss = mean;

run;

 

But it's not 100% clear that you intended to allow values to propagate across different ZONEs.  They do in your example, but should they?  It's an easy tweak to the program either way.

pamplemouse22
Calcite | Level 5

whoops, sorry you are correct, i did not make that clear. How would you adjust the program to adjust it? Ultimately, if there is a missing value for Jan 1, I would like to assign it the next value instead of the previous. Am thinking of using some if, then, else statements. 

Astounding
PROC Star

Since it's not clear how far ahead you have to search to find a nonmissing value (see ZONE=2, for example, where both JAN1 and JAN2 have missing values), here's what I would propose.  This assumes your data set is in order by ZONE DATE:

 

data first_nonmiss;

set have;

by zone;

where mean > .;

if first.zone;

keep zone mean;

run;

 

data want;

merge have first_nonmiss (rename=(mean=mean_nonmiss));

by zone;

if mean=. then mean=mean_nonmiss;

else mean_nonmiss=mean;

run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2176 views
  • 0 likes
  • 4 in conversation