I want to delete the entire row for all IDs if that ID is missing a certain value in another variable. See my example data below-- I want to delete the rows containing that ID only if they are missing location 'd' or 'e'. I think this can probably be handled easily with proc sql, but my knowledge of proc sql is pretty limited and my google results when I tried to search for answers were no help!
If your standard is that there must be both "d" and "e", then:
data want;
do until (last.id);
set have;
by id;
if location='d' then d=1;
else if location='e' then e=1;
end;
do until(last.id);
set have;
by id;
if d=e=1 then output;
end;
drop d e;
run;
This solution assumes that your data set is sorted by ID. If not, run PROC SORT first.
Simplest method I could think of:
data have; input id $ location $ @@; datalines; 1 a 1 c 1 d 1 e 2 a 2 b 2 c 2 d 2 e 3 a 3 b 3 c 3 d 4 a 4 b 4 c 4 e ; run; proc sql; create table want as select * from have where id in (select id from have where location in ("d","e")); quit;
Hello,
data have;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e 3 a 3 b 3 c 3 d
4 a 4 b 4 c 4 e
;
run;
proc sql;
CREATE TABLE want AS
SELECT *
FROM have
GROUP BY id
HAVING sum(location='d')>0 AND sum(location='e')>0
ORDER BY id, location
;
quit;
data want2;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e
;
run;
proc compare data=want compare=want2;
run;
If your standard is that there must be both "d" and "e", then:
data want;
do until (last.id);
set have;
by id;
if location='d' then d=1;
else if location='e' then e=1;
end;
do until(last.id);
set have;
by id;
if d=e=1 then output;
end;
drop d e;
run;
This solution assumes that your data set is sorted by ID. If not, run PROC SORT first.
data have;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e 3 a 3 b 3 c 3 d
4 a 4 b 4 c 4 e
;
run;
proc sql;
create table want as
select *
from have
group by id
having sum(location in ('d','e'))>1;
quit;
another datastep:
data have;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e
;
run;
data _null_;
if _n_=1 then do;
dcl hash H (dataset:'have',multidata:'y',ordered:'y') ;
h.definekey ("id") ;
h.definedata ("id","location") ;
h.definedone () ;
end;
do until(last.id);
set have end=lr;
by id;
if location in ('d','e') then _iorc_+1;
if last.id ;
if _iorc_<2 then rc=h.remove();
_iorc_=0;
end;
if lr then h.output(dataset:'want');
run;
What if location d or e appears more than once for a given id? I think the condition
having max(location='d') and max(location='e')
would be better.
fully agree Sir and always. Jealous of your extreme intelligence
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.