Solved: Re: Delete ids based on another condition

gadnuk · Posted 09-13-2018 09:34 AM

I want to delete the entire row for all IDs if that ID is missing a certain value in another variable. See my example data below-- I want to delete the rows containing that ID only if they are missing location 'd' or 'e'. I think this can probably be handled easily with proc sql, but my knowledge of proc sql is pretty limited and my google results when I tried to search for answers were no help!

data have;

input id $ location $ @@;

datalines;

1 a 1 c 1 d 1 e 2 a 2 b

2 c 2 d 2 e 3 a 3 b 3 c 3 d

4 a 4 b 4 c 4 e

;

run;

data want;

input id $ location $ @@;

datalines;

1 a 1 c 1 d 1 e 2 a 2 b

2 c 2 d 2 e

;

run;

Thanks in advance!

Astounding · Posted 09-13-2018 10:02 AM

If your standard is that there must be both "d" and "e", then:

data want;

do until (last.id);

set have;

by id;

if location='d' then d=1;

else if location='e' then e=1;

end;

do until(last.id);

set have;

by id;

if d=e=1 then output;

end;

drop d e;

run;

This solution assumes that your data set is sorted by ID. If not, run PROC SORT first.

View solution in original post

RW9 · Posted 09-13-2018 09:59 AM

Simplest method I could think of:

data have;
  input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e 3 a 3 b 3 c 3 d
4 a 4 b 4 c 4 e
;
run;

proc sql;
  create table want as
  select * 
  from   have
  where  id in (select id from have where location in ("d","e"));
quit;

gamotte · Posted 09-13-2018 10:02 AM

Hello,

data have;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e 3 a 3 b 3 c 3 d
4 a 4 b 4 c 4 e
;
run;

proc sql;
	CREATE TABLE want AS
	SELECT *
	FROM have
	GROUP BY id
	HAVING sum(location='d')>0 AND sum(location='e')>0
	ORDER BY id, location
;
quit;

data want2;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e 
;
run;

proc compare data=want compare=want2;
run;

Astounding · Posted 09-13-2018 10:02 AM

If your standard is that there must be both "d" and "e", then:

data want;

do until (last.id);

set have;

by id;

if location='d' then d=1;

else if location='e' then e=1;

end;

do until(last.id);

set have;

by id;

if d=e=1 then output;

end;

drop d e;

run;

This solution assumes that your data set is sorted by ID. If not, run PROC SORT first.

novinosrin · Posted 09-13-2018 10:27 AM

data have;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e 3 a 3 b 3 c 3 d
4 a 4 b 4 c 4 e
;
run;

proc sql;
create table want as
select *
from have
group by id
having sum(location in ('d','e'))>1;
quit;

novinosrin · Posted 09-13-2018 01:07 PM

another datastep:

data have;
input id $ location $ @@;
datalines;
1 a 1 c 1 d 1 e 2 a 2 b
2 c 2 d 2 e 
;
run;
data _null_;
if _n_=1 then do;
  dcl hash H (dataset:'have',multidata:'y',ordered:'y') ;
   h.definekey  ("id") ;
   h.definedata ("id","location") ;
   h.definedone () ;
end;
do until(last.id);
set have end=lr;
by id;
if  location in ('d','e') then _iorc_+1;
if last.id ;
if _iorc_<2 then  rc=h.remove();
_iorc_=0;
end;
if lr then h.output(dataset:'want');
run;

PGStats · Posted 09-13-2018 11:46 PM

What if location d or e appears more than once for a given id? I think the condition

having max(location='d') and max(location='e')

would be better.

PG

novinosrin · Posted 09-13-2018 11:57 PM

fully agree Sir and always. Jealous of your extreme intelligence

Registration is open

SAS Training: Just a Click Away