data ds_missing_values;
infile datalines dlm=',';
input group $ var1 var2 var3;
datalines;
A, 1, 2, 3
A, ., 2, 5
A, 3, 4, .
A, 8, ., .
B, ., ., .
B, ., 2, 6
B, 3, 1, 8
C, 9, 4, 8
;
run;
How to replace missing values with next values for repective variables missings output like below
A | 1 | 2 | 3 |
A | 3 | 2 | 5 |
A | 8 | 4 | 6 |
A | 3 | 2 | 6 |
B | 3 | 2 | 6 |
B | 3 | 2 | 8 |
C | 9 | 1 | 8 |
Why does group change to "A" in every observation?
@BrahmanandaRao wrote:
data ds_missing_values; infile datalines dlm=','; input group $ var1 var2 var3; datalines; A, 1, 2, 3 A, ., 2, 5 A, 3, 4, . A, 8, ., . B, ., ., . B, ., 2, 6 B, 3, 1, 8 C, 9, 4, 8 ; run;
How to replace missing values with next values for repective variables missings output like below
A 1 2 3 A 3 2 5 A 8 4 6 A 3 2 6 A 3 2 6 A 3 2 8 A 9 1 8
When you have a variable like your Group variable it seems unlikely that you should acquire values from a different group to replace missing.
Can you describe the purpose of replacing these values? How will you use the data? That may give us some other ideas.
@BrahmanandaRao wrote:
data ds_missing_values; infile datalines dlm=','; input group $ var1 var2 var3; datalines; A, 1, 2, 3 A, ., 2, 5 A, 3, 4, . A, 8, ., . B, ., ., . B, ., 2, 6 B, 3, 1, 8 C, 9, 4, 8 ; run;
How to replace missing values with next values for repective variables missings output like below
A 1 2 3 A 3 2 5 A 8 4 6 A 3 2 6 B 3 2 6 B 3 2 8 C 9 1 8
In group A, your sample input shows the VAR1 sequence to be 1, ., 3,. 8.
Why does your sample output show 1,3,8,3? Or are you prepared to take the next VAR1 from group B? And it looks like you are MOVING a future non-missing value back, not just copying it. Is that right
And why does the sample input have 3 group B observations, but the sample output have only 2?
And additionally to questions already asked: What should happen in if the last row in a group has a missing value? Leave it missing or something else?
In the absence of a reply, here's some code that solves what I presume is the typical "bringing back the future" problem. For each record, a missing value is replaced by the earliest following non-missing value, within the same group. This means of course that any group that ends with missing values will not have those values replaced:
data ds_missing_values;
infile datalines dlm=',';
input group $ var1 var2 var3;
datalines;
A, 1, 2, 3
A, ., 2, 5
A, 3, 4, .
A, 8, ., .
B, ., ., .
B, ., 2, 6
B, 3, 1, 8
C, 9, 4, 8
run;
data want (drop=_:);
set ds_missing_values (in=firstpass)
ds_missing_values (in=secondpass);
by group;
array var {3} var1-var3;
array history {3,20} _histv1_1-_histv1_20
_histv2_1-_histv2_20
_histv3_1-_histv3_20 ;
retain _: ;
if first.group then call missing(_f,_s,of _:);
_f+firstpass;
if firstpass then do _v=1 to 3;
history{_v,_f}=var{_v};
end;
if secondpass;
_s+secondpass;
call missing(history{1,_s},history{2,_s},history{3,_s});
if var1=. then var1=coalesce(of _histv1:);
if var2=. then var2=coalesce(of _histv2:);
if var3=. then var3=coalesce(of _histv3:);
run;
The program passes through eac group twice: the first time to build a history (up to 20 obs in this case) of values for VAR1, VAR2, and VAR3. The second pass will tap into that history, replacing a missing value with the earliest following non-missing (that's what the coalesce functiom does). Also during the second pass, current data in the history array is eliminated, so only future values are exposed to the coalesce functions.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.