BookmarkSubscribeRSS Feed
BrahmanandaRao
Lapis Lazuli | Level 10
data ds_missing_values;
	infile datalines dlm=',';
	input group $ var1 var2 var3;
	datalines;
A, 1, 2, 3
A, ., 2, 5
A, 3, 4, .
A, 8, ., .
B, ., ., .
B, ., 2, 6
B, 3, 1, 8
C, 9, 4, 8

;
run;

How to replace missing values with next values for repective variables missings output like below  

A 1 2 3
A 3 2 5
A 8 4 6
A 3 2 6
B 3 2 6
B 3 2 8
C 9 1 8
5 REPLIES 5
Kurt_Bremser
Super User

Why does group change to "A" in every observation?

 


@BrahmanandaRao wrote:
data ds_missing_values;
	infile datalines dlm=',';
	input group $ var1 var2 var3;
	datalines;
A, 1, 2, 3
A, ., 2, 5
A, 3, 4, .
A, 8, ., .
B, ., ., .
B, ., 2, 6
B, 3, 1, 8
C, 9, 4, 8

;
run;

How to replace missing values with next values for repective variables missings output like below  

A 1 2 3
A 3 2 5
A 8 4 6
A 3 2 6
A 3 2 6
A 3 2 8
A 9 1 8

 

ballardw
Super User

When you have a variable like your Group variable it seems unlikely that you should acquire values from a different group to replace missing.

 

Can you describe the purpose of replacing these values? How will you use the data? That may give us some other ideas.

 


@BrahmanandaRao wrote:
data ds_missing_values;
	infile datalines dlm=',';
	input group $ var1 var2 var3;
	datalines;
A, 1, 2, 3
A, ., 2, 5
A, 3, 4, .
A, 8, ., .
B, ., ., .
B, ., 2, 6
B, 3, 1, 8
C, 9, 4, 8

;
run;

How to replace missing values with next values for repective variables missings output like below  

A 1 2 3
A 3 2 5
A 8 4 6
A 3 2 6
B 3 2 6
B 3 2 8
C 9 1 8

 

mkeintz
PROC Star

In group A, your sample input shows the VAR1 sequence to be   1, ., 3,. 8.   

 

Why does your sample output show  1,3,8,3?  Or are you prepared to take the next VAR1 from group B?  And it looks like you are MOVING a future non-missing value back, not just copying it.  Is that right

 

And why does the sample input have 3 group B observations, but the sample output have only 2?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

And additionally to questions already asked: What should happen in if the last row in a group has a missing value? Leave it missing or something else?

mkeintz
PROC Star

In the absence of a reply, here's some code that solves what I presume is the typical "bringing back the future" problem.  For each record, a missing value is replaced by the earliest following non-missing value, within the same group.  This means of course that any group that ends with missing values will not have those values replaced:

 

data ds_missing_values;
	infile datalines dlm=',';
	input group $ var1 var2 var3;
	datalines;
A, 1, 2, 3
A, ., 2, 5
A, 3, 4, .
A, 8, ., .
B, ., ., .
B, ., 2, 6
B, 3, 1, 8
C, 9, 4, 8
run;

data want (drop=_:);
  set ds_missing_values (in=firstpass)
      ds_missing_values (in=secondpass);
  by group;

  array var {3} var1-var3;

  array history {3,20}  _histv1_1-_histv1_20  
                        _histv2_1-_histv2_20  
                        _histv3_1-_histv3_20 ;
  retain _: ;

  if first.group then call missing(_f,_s,of _:);
  _f+firstpass;

  if firstpass then do _v=1 to 3;
    history{_v,_f}=var{_v};
  end;

  if secondpass;
  _s+secondpass;
  call missing(history{1,_s},history{2,_s},history{3,_s});
  if var1=. then var1=coalesce(of _histv1:);
  if var2=. then var2=coalesce(of _histv2:);
  if var3=. then var3=coalesce(of _histv3:);
run;

The program passes through eac group twice: the first time to build a history (up to 20 obs in this case) of values for VAR1, VAR2, and VAR3.  The second pass will tap into that history, replacing a missing value with the earliest following non-missing (that's what the coalesce functiom does).  Also during the second pass, current data in the history array is eliminated, so only future values are exposed to the coalesce functions.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 802 views
  • 1 like
  • 5 in conversation