Hi,
I have a dataset that each ID has multiple records. Within each group (ID), the variable A and B have only two values, missing and some other value.
data have;
input ID A B;
cards;
1 10 .
1 . 16
1 . 16
1 10 .
2 5 .
2 5 .
2 5 .
2 . 7
What I want to do is to replace the missing value with the non-missing value of the same variable within each group:
1 10 16
1 10 16
1 10 16
1 10 16
2 5 7
2 5 7
2 5 7
2 5 7
I really appreciate any help with this question.
Best regards,
C
Use a double DO loop:
data want;
do until (last.id);
set have;
by id;
_a = max(_a,a);
_b = max(_b,b);
end;
do until (last.id);
set have;
by id;
a = _a;
b = _b;
output;
end;
drop _:;
run;
Untested, posted from my tablet.
PS
PLEASE (and that's a very big PLEASE) use the "little running man" button to post code. Really. I mean it.
I positively hate to have to clean out all the iffy things (empty lines etc) that the main posting window does to code.
BTW, I managed to run the code (after cleaning up your data step, see above) on SAS on Demand, so it's tested now.
If it really is the case that there are non-missing values which are always the same within an ID, or missing values (never two different non-missing values within an ID), then in my opinion, the easiest solution is to use PROC STDIZE with the REPONLY option.
proc stdize data=have reponly method=mean out=want;
by id;
var a b;
run;
data have;
input ID A B;
cards;
1 10 .
1 . 16
1 . 16
1 10 .
2 5 .
2 5 .
2 5 .
2 . 7
;
data want(drop=aa bb);
do _N_ = 1 by 1 until (last.ID);
set have;
by ID;
aa = aa <> a;
bb = bb <> b;
end;
do _N_ = 1 to _N_;
set have;
if a = . then a = aa;
if b = . then b = bb;
output;
end;
run;
Result:
ID A B 1 10 16 1 10 16 1 10 16 1 10 16 2 5 7 2 5 7 2 5 7 2 5 7
Hi @CynthiaWei While it's a privilege to learn from the real "Proc Star- @PaigeMiller " who would make SAS look innocent with his mastery of Procs, with yet another demonstration of a Proc that does with utmost elegance, here is another alternative of Datastep that utilizes the UPDATE construct to fill the missing values.
data have;
input ID A B;
cards;
1 10 .
1 . 16
1 . 16
1 10 .
2 5 .
2 5 .
2 5 .
2 . 7
;
data want;
do _n_=1 by 1 until(last.id);
update have(obs=0) have;
by id;
end;
do _n_=1 to _n_;
output;
end;
run;
The above technique might be a bit of learning curve when you intend to do further processing i.e. should the need is to go beyond after fill in a situation that would involve look up/matrix programming(hashes/multi dim arrays) among others for further processing.
Alternatively, the MAX operator/function is indeed the traditional way as it works for both SQL/Datastep. For one, the syntax being convenient and many users are quite familiar with SQL as is often known as "readymeals" solution,
proc sql;
create table want as
select a.id,A,B
from have(keep=id) a left join
(select id, max(a) as A, max(b) as B from have group by id) b
on a.id=b.id;
quit;
If you are familiar with automatic remerge functionality of Proc SQL, you might be tempted to attempt that auto fill, albeit the SQL processor wouldn't see the inclusion of a non summary list variable to force or trigger that. Therefore, you would need an explicit self JOIN.
Fun stuff! Have a great day and Kind regards 🙂
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.