Hello,
I want to get each id with non-missing y and z. but if I have highest x with non-missing y or z, then I will keep them.
My approach is if I can replace missing x and y to the previous no-missing value, then at the end, I can just keep last.id. However, I'm having difficulty to replace these missing value. I try lag, but doesn't work well.
I appreciate any help.
data temp;
input id x y z;
cards;
1 2 3 2
1 4 5 .
1 4 . 4
1 4 . 7
2 6 4 2
2 6 8 .
2 7 . 8
2 7 9 .
2 9 . .
3 4 2 3
3 4 4 .
3 7 . 9
3 7 . 6
4 2 . 1
4 3 4 2
;
run;
The result I expect is:
id x y z;
1 4 5 7
2 7 9 8
3 7 4 6
4 3 4 2
data HAVE;
input ID X Y Z;
cards;
1 2 3 2
1 4 5 .
1 4 . 4
1 4 . 7
2 6 4 2
2 6 8 .
2 7 . 8
2 7 9 .
2 9 . .
3 4 2 3
3 4 4 .
3 7 . 9
3 7 . 6
4 2 . 1
4 3 4 2
run;
data WANT;
merge HAVE(keep=ID X Y where=(Y ne .))
HAVE(keep=ID X Z where=(Z ne .));
by ID ;
if last.ID;
putlog ID X Y Z;
run;
1 4 5 7
2 7 9 8
3 7 4 6
4 3 4 2
proc sql;
select id, max(x) as x,max(y) as y,max(z) as z from temp group by id;
quit;
sIchen -- Thanks for your reply. I'm not look for the max value on each column. I am looking for the non-missing value for y and z that has the latest x.
Dear Belle,
You can try this code:
data have;
set temp;
if y ne . and z ne . then output;
run; *build a new data set with non-missing of y an z.*
proc sql;
select id max(x) as x, y, z *select the highest x from the data set*
from have
group by id;
quit;
proc print;
run;
HarryLiu,
I appreciate your help, I am not trying to get no missing fields. I want to replace these missing fields with previous value, so I can get the last.id information. But your information may help my future cases.
Hi Belle,
To replace missing value, you can use retain statement.
The sample code is as following:
data want;
set temp;
retain new_y;
if missing (y) then new_y=x;
else new_y=y;
output;
run;
proc print;
run;
Best,
Harry Liu
data want(drop=i a:);
set temp;
by id;
retain a1-a3;
array t(*) a1-a3;
array v(*) x y z;
do i=1 to dim(v);
if v(i) ne . then t(i)=v(i);
v(i)=t(i);
end;
if last.id;
run;
Thanks stat@sas, your answer is pretty close, except for id2, x should be 7 instead of 9. if I delete missing y and z at the beginning, then it works.
data HAVE;
input ID X Y Z;
cards;
1 2 3 2
1 4 5 .
1 4 . 4
1 4 . 7
2 6 4 2
2 6 8 .
2 7 . 8
2 7 9 .
2 9 . .
3 4 2 3
3 4 4 .
3 7 . 9
3 7 . 6
4 2 . 1
4 3 4 2
run;
data WANT;
merge HAVE(keep=ID X Y where=(Y ne .))
HAVE(keep=ID X Z where=(Z ne .));
by ID ;
if last.ID;
putlog ID X Y Z;
run;
1 4 5 7
2 7 9 8
3 7 4 6
4 3 4 2
Thanks Chris, your answer works well.
No worries. It s the first time I see SAS's merge behaviour with multiple keys actually be useful!
You can do this way:
data want;
set have;
retain Y_new;
by ID;
if ID=lag(ID) then do;
if Y=. then Y=y_new;
else y_new=y;
end;
if last.ID then output want;
run;
viveklanka, thanks for you help.
data HAVE;
input ID X Y Z;
cards;
1 2 3 2
1 4 5 .
1 4 . 4
1 4 . 7
2 6 4 2
2 6 8 .
2 7 . 8
2 7 9 .
2 9 . .
3 4 2 3
3 4 4 .
3 7 . 9
3 7 . 6
4 2 . 1
4 3 4 2
run;
data want;
set have;
by id;
retain _x _y _z;
if first.id then call missing(_x,_y,_z);
if not missing(x) then _x=x;
if not missing(y) then _y=y;
if not missing(z) then _z=z;
if last.id;
drop x y z;
run;
Hi xia keshan,
Thanks for your helpful answer. your answer is same as stat@sas, problem on id2. But I can fix it by eliminate missing y and z first.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.