BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MB_Analyst
Obsidian | Level 7

 

I have some data with an ID (grouping variable) and I want to keep observations in a group if any number from column "current" shows up in column "from". Or vice versa. 

 

DATA have;
    input ID current from;
    datalines;
1     100    .
1     200    .
1     300   100
1     400   100
1     200    .
1     500    300
;

So in this case I would drop the 2nd and 5th observation as it's only contained within one column. 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @MB_Analyst,

 

Try this:

 

data want;
if _n_=1 then do;
  dcl hash h1();
  h1.definekey('current');
  h1.definedone();
  dcl hash h2();
  h2.definekey('from');
  h2.definedone();
end;
do until(last.id);
  set have;
  by id;
  if current>. then h1.ref();
  if from>. then h2.ref();
end;
do until(last.id);
  set have;
  by id;
  if h1.check(key:from)=0 | h2.check(key:current)=0 then output;
end;
h1.clear();
h2.clear();
run;
  • In the first DO-END block, which is executed only in the first iteration of the DATA step, two temporary look-up tables (hash objects) H1 and H2 are declared. They will store the distinct non-missing CURRENT and FROM values, respectively, for a single ID (one ID per iteration of the DATA step).
  • The second DO-END block populates H1 and H2 with the values described above. It is assumed that HAVE is sorted by ID.
  • The third DO-END block reads the same BY group again and applies the selection criterion. The CHECK method returns 0 if the key value is found (i.e. the FROM value in the table of CURRENT values or vice versa). If at least one of the two look-ups yields a match, the observation is written to dataset WANT. Note that observations with missing values for CURRENT and FROM are not written to the output dataset.
  • After each BY group the look-up tables are cleared.

 

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

Do you just have 2 columns to check or many. if it is just two columns, very straight forward. Can you confirm plz

 

And what about 400,500 obs 4 and 6. Those are also contained in only current column???

MB_Analyst
Obsidian | Level 7

I only have two columns to check. The 4th and 6th column would stay in because:

 

OBS 4 - 100 is within the first column

OBS 6 - 300 is within the first column 

novinosrin
Tourmaline | Level 20

I mean the 400, 500

MB_Analyst
Obsidian | Level 7

Yes, when the "current" column is 400 or 500, the "from" column is 100 and 300. Because both 100 and 300 appear in the first column, both observations are retained. 

FreelanceReinh
Jade | Level 19

Hi @MB_Analyst,

 

Try this:

 

data want;
if _n_=1 then do;
  dcl hash h1();
  h1.definekey('current');
  h1.definedone();
  dcl hash h2();
  h2.definekey('from');
  h2.definedone();
end;
do until(last.id);
  set have;
  by id;
  if current>. then h1.ref();
  if from>. then h2.ref();
end;
do until(last.id);
  set have;
  by id;
  if h1.check(key:from)=0 | h2.check(key:current)=0 then output;
end;
h1.clear();
h2.clear();
run;
  • In the first DO-END block, which is executed only in the first iteration of the DATA step, two temporary look-up tables (hash objects) H1 and H2 are declared. They will store the distinct non-missing CURRENT and FROM values, respectively, for a single ID (one ID per iteration of the DATA step).
  • The second DO-END block populates H1 and H2 with the values described above. It is assumed that HAVE is sorted by ID.
  • The third DO-END block reads the same BY group again and applies the selection criterion. The CHECK method returns 0 if the key value is found (i.e. the FROM value in the table of CURRENT values or vice versa). If at least one of the two look-ups yields a match, the observation is written to dataset WANT. Note that observations with missing values for CURRENT and FROM are not written to the output dataset.
  • After each BY group the look-up tables are cleared.

 

Ksharp
Super User

OK.  assume I understood your question. And there was only one ID/group

 

DATA x;
    input ID current from;
    datalines;
1     100    .
1     200    .
1     300   100
1     400   100
1     200    .
1     500    300
;
data have;
 set x;
 	drop id;
	rename current=to;
run;


data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;


proc sql;
create table key as
 select * from want
  group by household
   having count(*)=1;

create table final_want as
 select * from x
  where  current not in (select node from key);
quit;

  

MB_Analyst
Obsidian | Level 7

I have more than one ID group, but this does work for the one ID. Thank you.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1663 views
  • 2 likes
  • 4 in conversation