I have some data with an ID (grouping variable) and I want to keep observations in a group if any number from column "current" shows up in column "from". Or vice versa.
DATA have;
input ID current from;
datalines;
1 100 .
1 200 .
1 300 100
1 400 100
1 200 .
1 500 300
;
So in this case I would drop the 2nd and 5th observation as it's only contained within one column.
Hi @MB_Analyst,
Try this:
data want;
if _n_=1 then do;
dcl hash h1();
h1.definekey('current');
h1.definedone();
dcl hash h2();
h2.definekey('from');
h2.definedone();
end;
do until(last.id);
set have;
by id;
if current>. then h1.ref();
if from>. then h2.ref();
end;
do until(last.id);
set have;
by id;
if h1.check(key:from)=0 | h2.check(key:current)=0 then output;
end;
h1.clear();
h2.clear();
run;
Do you just have 2 columns to check or many. if it is just two columns, very straight forward. Can you confirm plz
And what about 400,500 obs 4 and 6. Those are also contained in only current column???
I only have two columns to check. The 4th and 6th column would stay in because:
OBS 4 - 100 is within the first column
OBS 6 - 300 is within the first column
I mean the 400, 500
Yes, when the "current" column is 400 or 500, the "from" column is 100 and 300. Because both 100 and 300 appear in the first column, both observations are retained.
Hi @MB_Analyst,
Try this:
data want;
if _n_=1 then do;
dcl hash h1();
h1.definekey('current');
h1.definedone();
dcl hash h2();
h2.definekey('from');
h2.definedone();
end;
do until(last.id);
set have;
by id;
if current>. then h1.ref();
if from>. then h2.ref();
end;
do until(last.id);
set have;
by id;
if h1.check(key:from)=0 | h2.check(key:current)=0 then output;
end;
h1.clear();
h2.clear();
run;
OK. assume I understood your question. And there was only one ID/group
DATA x;
input ID current from;
datalines;
1 100 .
1 200 .
1 300 100
1 400 100
1 200 .
1 500 300
;
data have;
set x;
drop id;
rename current=to;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
proc sql;
create table key as
select * from want
group by household
having count(*)=1;
create table final_want as
select * from x
where current not in (select node from key);
quit;
I have more than one ID group, but this does work for the one ID. Thank you.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.