I want to delete the entire row for all IDs if that ID is missing a certain value in another variable. See my example data below-- I want to delete the rows containing that ID only if they are missing location 'd' or 'e'. I think this can probably be handled easily with proc sql, but my knowledge of proc sql is pretty limited and my google results when I tried to search for answers were no help!
If your standard is that there must be both "d" and "e", then:
data want;
do until (last.id);
set have;
by id;
if location='d' then d=1;
else if location='e' then e=1;
end;
do until(last.id);
set have;
by id;
if d=e=1 then output;
end;
drop d e;
run;
This solution assumes that your data set is sorted by ID. If not, run PROC SORT first.
Simplest method I could think of:
data have; input id $ location $ @@; datalines; 1 a 1 c 1 d 1 e 2 a 2 b 2 c 2 d 2 e 3 a 3 b 3 c 3 d 4 a 4 b 4 c 4 e ; run; proc sql; create table want as select * from have where id in (select id from have where location in ("d","e")); quit;
Hello,
data have;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e 3 a 3 b 3 c 3 d
4 a 4 b 4 c 4 e
;
run;
proc sql;
CREATE TABLE want AS
SELECT *
FROM have
GROUP BY id
HAVING sum(location='d')>0 AND sum(location='e')>0
ORDER BY id, location
;
quit;
data want2;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e
;
run;
proc compare data=want compare=want2;
run;
If your standard is that there must be both "d" and "e", then:
data want;
do until (last.id);
set have;
by id;
if location='d' then d=1;
else if location='e' then e=1;
end;
do until(last.id);
set have;
by id;
if d=e=1 then output;
end;
drop d e;
run;
This solution assumes that your data set is sorted by ID. If not, run PROC SORT first.
data have;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e 3 a 3 b 3 c 3 d
4 a 4 b 4 c 4 e
;
run;
proc sql;
create table want as
select *
from have
group by id
having sum(location in ('d','e'))>1;
quit;
another datastep:
data have;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e
;
run;
data _null_;
if _n_=1 then do;
dcl hash H (dataset:'have',multidata:'y',ordered:'y') ;
h.definekey ("id") ;
h.definedata ("id","location") ;
h.definedone () ;
end;
do until(last.id);
set have end=lr;
by id;
if location in ('d','e') then _iorc_+1;
if last.id ;
if _iorc_<2 then rc=h.remove();
_iorc_=0;
end;
if lr then h.output(dataset:'want');
run;
What if location d or e appears more than once for a given id? I think the condition
having max(location='d') and max(location='e')
would be better.
fully agree Sir and always. Jealous of your extreme intelligence
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.