Re: how to replace missings values with nextvalues

BrahmanandaRao · Posted 08-26-2023 02:05 AM

data ds_missing_values;
	infile datalines dlm=',';
	input group $ var1 var2 var3;
	datalines;
A, 1, 2, 3
A, ., 2, 5
A, 3, 4, .
A, 8, ., .
B, ., ., .
B, ., 2, 6
B, 3, 1, 8
C, 9, 4, 8

;
run;

How to replace missing values with next values for repective variables missings output like below

A	1	2	3
A	3	2	5
A	8	4	6
A	3	2	6
B	3	2	6
B	3	2	8
C	9	1	8

Kurt_Bremser · Posted 08-26-2023 02:36 AM

Why does group change to "A" in every observation?

@BrahmanandaRao wrote:
data ds_missing_values;
	infile datalines dlm=',';
	input group $ var1 var2 var3;
	datalines;
A, 1, 2, 3
A, ., 2, 5
A, 3, 4, .
A, 8, ., .
B, ., ., .
B, ., 2, 6
B, 3, 1, 8
C, 9, 4, 8

;
run;
How to replace missing values with next values for repective variables missings output like below

A 1 2 3

A 3 2 5

A 8 4 6

A 3 2 6

A 3 2 6

A 3 2 8

A 9 1 8

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 08-26-2023 02:51 AM

When you have a variable like your Group variable it seems unlikely that you should acquire values from a different group to replace missing.

Can you describe the purpose of replacing these values? How will you use the data? That may give us some other ideas.

@BrahmanandaRao wrote:
data ds_missing_values;
	infile datalines dlm=',';
	input group $ var1 var2 var3;
	datalines;
A, 1, 2, 3
A, ., 2, 5
A, 3, 4, .
A, 8, ., .
B, ., ., .
B, ., 2, 6
B, 3, 1, 8
C, 9, 4, 8

;
run;
How to replace missing values with next values for repective variables missings output like below

A 1 2 3

A 3 2 5

A 8 4 6

A 3 2 6

B 3 2 6

B 3 2 8

C 9 1 8

mkeintz · Posted 08-26-2023 08:09 PM

In group A, your sample input shows the VAR1 sequence to be 1, ., 3,. 8.

Why does your sample output show 1,3,8,3? Or are you prepared to take the next VAR1 from group B? And it looks like you are MOVING a future non-missing value back, not just copying it. Is that right

And why does the sample input have 3 group B observations, but the sample output have only 2?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Patrick · Posted 08-26-2023 08:54 PM

And additionally to questions already asked: What should happen in if the last row in a group has a missing value? Leave it missing or something else?

mkeintz · Posted 08-27-2023 04:27 PM

In the absence of a reply, here's some code that solves what I presume is the typical "bringing back the future" problem. For each record, a missing value is replaced by the earliest following non-missing value, within the same group. This means of course that any group that ends with missing values will not have those values replaced:

data ds_missing_values;
	infile datalines dlm=',';
	input group $ var1 var2 var3;
	datalines;
A, 1, 2, 3
A, ., 2, 5
A, 3, 4, .
A, 8, ., .
B, ., ., .
B, ., 2, 6
B, 3, 1, 8
C, 9, 4, 8
run;

data want (drop=_:);
  set ds_missing_values (in=firstpass)
      ds_missing_values (in=secondpass);
  by group;

  array var {3} var1-var3;

  array history {3,20}  _histv1_1-_histv1_20  
                        _histv2_1-_histv2_20  
                        _histv3_1-_histv3_20 ;
  retain _: ;

  if first.group then call missing(_f,_s,of _:);
  _f+firstpass;

  if firstpass then do _v=1 to 3;
    history{_v,_f}=var{_v};
  end;

  if secondpass;
  _s+secondpass;
  call missing(history{1,_s},history{2,_s},history{3,_s});
  if var1=. then var1=coalesce(of _histv1:);
  if var2=. then var2=coalesce(of _histv2:);
  if var3=. then var3=coalesce(of _histv3:);
run;

The program passes through eac group twice: the first time to build a history (up to 20 obs in this case) of values for VAR1, VAR2, and VAR3. The second pass will tap into that history, replacing a missing value with the earliest following non-missing (that's what the coalesce functiom does). Also during the second pass, current data in the history array is eliminated, so only future values are exposed to the coalesce functions.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

how to replace missings values with nextvalues