- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have some data with an ID (grouping variable) and I want to keep observations in a group if any number from column "current" shows up in column "from". Or vice versa.
DATA have;
input ID current from;
datalines;
1 100 .
1 200 .
1 300 100
1 400 100
1 200 .
1 500 300
;
So in this case I would drop the 2nd and 5th observation as it's only contained within one column.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @MB_Analyst,
Try this:
data want;
if _n_=1 then do;
dcl hash h1();
h1.definekey('current');
h1.definedone();
dcl hash h2();
h2.definekey('from');
h2.definedone();
end;
do until(last.id);
set have;
by id;
if current>. then h1.ref();
if from>. then h2.ref();
end;
do until(last.id);
set have;
by id;
if h1.check(key:from)=0 | h2.check(key:current)=0 then output;
end;
h1.clear();
h2.clear();
run;
- In the first DO-END block, which is executed only in the first iteration of the DATA step, two temporary look-up tables (hash objects) H1 and H2 are declared. They will store the distinct non-missing CURRENT and FROM values, respectively, for a single ID (one ID per iteration of the DATA step).
- The second DO-END block populates H1 and H2 with the values described above. It is assumed that HAVE is sorted by ID.
- The third DO-END block reads the same BY group again and applies the selection criterion. The CHECK method returns 0 if the key value is found (i.e. the FROM value in the table of CURRENT values or vice versa). If at least one of the two look-ups yields a match, the observation is written to dataset WANT. Note that observations with missing values for CURRENT and FROM are not written to the output dataset.
- After each BY group the look-up tables are cleared.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you just have 2 columns to check or many. if it is just two columns, very straight forward. Can you confirm plz
And what about 400,500 obs 4 and 6. Those are also contained in only current column???
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I only have two columns to check. The 4th and 6th column would stay in because:
OBS 4 - 100 is within the first column
OBS 6 - 300 is within the first column
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I mean the 400, 500
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, when the "current" column is 400 or 500, the "from" column is 100 and 300. Because both 100 and 300 appear in the first column, both observations are retained.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @MB_Analyst,
Try this:
data want;
if _n_=1 then do;
dcl hash h1();
h1.definekey('current');
h1.definedone();
dcl hash h2();
h2.definekey('from');
h2.definedone();
end;
do until(last.id);
set have;
by id;
if current>. then h1.ref();
if from>. then h2.ref();
end;
do until(last.id);
set have;
by id;
if h1.check(key:from)=0 | h2.check(key:current)=0 then output;
end;
h1.clear();
h2.clear();
run;
- In the first DO-END block, which is executed only in the first iteration of the DATA step, two temporary look-up tables (hash objects) H1 and H2 are declared. They will store the distinct non-missing CURRENT and FROM values, respectively, for a single ID (one ID per iteration of the DATA step).
- The second DO-END block populates H1 and H2 with the values described above. It is assumed that HAVE is sorted by ID.
- The third DO-END block reads the same BY group again and applies the selection criterion. The CHECK method returns 0 if the key value is found (i.e. the FROM value in the table of CURRENT values or vice versa). If at least one of the two look-ups yields a match, the observation is written to dataset WANT. Note that observations with missing values for CURRENT and FROM are not written to the output dataset.
- After each BY group the look-up tables are cleared.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK. assume I understood your question. And there was only one ID/group
DATA x;
input ID current from;
datalines;
1 100 .
1 200 .
1 300 100
1 400 100
1 200 .
1 500 300
;
data have;
set x;
drop id;
rename current=to;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
proc sql;
create table key as
select * from want
group by household
having count(*)=1;
create table final_want as
select * from x
where current not in (select node from key);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have more than one ID group, but this does work for the one ID. Thank you.