Hello,
I want to get each id with non-missing y and z. but if I have highest x with non-missing y or z, then I will keep them.
My approach is if I can replace missing x and y to the previous no-missing value, then at the end, I can just keep last.id. However, I'm having difficulty to replace these missing value. I try lag, but doesn't work well.
I appreciate any help.
data temp;
input id x y z;
cards;
1 2 3 2
1 4 5 .
1 4 . 4
1 4 . 7
2 6 4 2
2 6 8 .
2 7 . 8
2 7 9 .
2 9 . .
3 4 2 3
3 4 4 .
3 7 . 9
3 7 . 6
4 2 . 1
4 3 4 2
;
run;
The result I expect is:
id x y z;
1 4 5 7
2 7 9 8
3 7 4 6
4 3 4 2
data HAVE;
input ID X Y Z;
cards;
1 2 3 2
1 4 5 .
1 4 . 4
1 4 . 7
2 6 4 2
2 6 8 .
2 7 . 8
2 7 9 .
2 9 . .
3 4 2 3
3 4 4 .
3 7 . 9
3 7 . 6
4 2 . 1
4 3 4 2
run;
data WANT;
merge HAVE(keep=ID X Y where=(Y ne .))
HAVE(keep=ID X Z where=(Z ne .));
by ID ;
if last.ID;
putlog ID X Y Z;
run;
1 4 5 7
2 7 9 8
3 7 4 6
4 3 4 2
proc sql;
select id, max(x) as x,max(y) as y,max(z) as z from temp group by id;
quit;
sIchen -- Thanks for your reply. I'm not look for the max value on each column. I am looking for the non-missing value for y and z that has the latest x.
Dear Belle,
You can try this code:
data have;
set temp;
if y ne . and z ne . then output;
run; *build a new data set with non-missing of y an z.*
proc sql;
select id max(x) as x, y, z *select the highest x from the data set*
from have
group by id;
quit;
proc print;
run;
HarryLiu,
I appreciate your help, I am not trying to get no missing fields. I want to replace these missing fields with previous value, so I can get the last.id information. But your information may help my future cases.
Hi Belle,
To replace missing value, you can use retain statement.
The sample code is as following:
data want;
set temp;
retain new_y;
if missing (y) then new_y=x;
else new_y=y;
output;
run;
proc print;
run;
Best,
Harry Liu
data want(drop=i a:);
set temp;
by id;
retain a1-a3;
array t(*) a1-a3;
array v(*) x y z;
do i=1 to dim(v);
if v(i) ne . then t(i)=v(i);
v(i)=t(i);
end;
if last.id;
run;
Thanks stat@sas, your answer is pretty close, except for id2, x should be 7 instead of 9. if I delete missing y and z at the beginning, then it works.
data HAVE;
input ID X Y Z;
cards;
1 2 3 2
1 4 5 .
1 4 . 4
1 4 . 7
2 6 4 2
2 6 8 .
2 7 . 8
2 7 9 .
2 9 . .
3 4 2 3
3 4 4 .
3 7 . 9
3 7 . 6
4 2 . 1
4 3 4 2
run;
data WANT;
merge HAVE(keep=ID X Y where=(Y ne .))
HAVE(keep=ID X Z where=(Z ne .));
by ID ;
if last.ID;
putlog ID X Y Z;
run;
1 4 5 7
2 7 9 8
3 7 4 6
4 3 4 2
Thanks Chris, your answer works well.
No worries. It s the first time I see SAS's merge behaviour with multiple keys actually be useful!
You can do this way:
data want;
set have;
retain Y_new;
by ID;
if ID=lag(ID) then do;
if Y=. then Y=y_new;
else y_new=y;
end;
if last.ID then output want;
run;
viveklanka, thanks for you help.
data HAVE;
input ID X Y Z;
cards;
1 2 3 2
1 4 5 .
1 4 . 4
1 4 . 7
2 6 4 2
2 6 8 .
2 7 . 8
2 7 9 .
2 9 . .
3 4 2 3
3 4 4 .
3 7 . 9
3 7 . 6
4 2 . 1
4 3 4 2
run;
data want;
set have;
by id;
retain _x _y _z;
if first.id then call missing(_x,_y,_z);
if not missing(x) then _x=x;
if not missing(y) then _y=y;
if not missing(z) then _z=z;
if last.id;
drop x y z;
run;
Hi xia keshan,
Thanks for your helpful answer. your answer is same as stat@sas, problem on id2. But I can fix it by eliminate missing y and z first.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.