DATA Step, Macro, Functions and more

Assigining same id by group with additional rule

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

Assigining same id by group with additional rule

Hello. I have a trouble in assiging same id by group.

 

I have a dataset that looks like this.

 

obsvar1var2
1xa
2xb
3yc
4yd
5ye
6yb
7zf
8zg
9zh
10zi

 

I want to assign codes based on the var1. So obs 1, 2 should be assigned to the same id because they share "x" value on var1.

However, I also have another rule which is assigning same id if observations share same value of var2. For example, following this rule, obs 1-6 should be assigned same id because obs 6 has "b" on var2 which is same as that of obs2.

 

Could anybody help me find a code to solve the problem?

Thanks for reading the thread.


Accepted Solutions
Solution
‎03-12-2018 10:29 AM
Super User
Posts: 10,850

Re: Assigining same id by group with additional rule

If you do not have SAS/OR. Try my code .

 

data have ;
infile cards expandtabs truncover;
input obs	from $ to $;
drop obs;
cards;
1	a	x
2	a	y
3	b	x
4	b	 
5	b	z
6	c	u
7	c	v
8	c	w
9	d	 
10	d	m
;
run;




data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

proc sql;
create table key as
 select * from want
  where node in (select from from have);
create table final_want as
 select a.*,b.household from have as a left join key as b
  on a.from=key.node;
quit;

proc print;run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,399

Re: Assigining same id by group with additional rule

You have stated an example of a rule that says:

  1. obs 1 and 2 get ID=1 since they both have var1=X
  2. obs 6 gets ID=1 since it shares var2=b with obs 2

then does the rule also imply that obs 3,4, and 5 also get ID=1 since they share var1=Y with obs 6?

 

How recursive is your linking rule?  Is an ID group comprised of all obs that share either var1 or var2 with at least one other group member?

Contributor
Posts: 22

Re: Assigining same id by group with additional rule

The way you understood is absoulely true. Rule 1 and 2 imply that obs 1 through 6 should be assigned the same code.

Also, you can summarize that members of a group that has been assigned the same code share either same var1 or var2 wih at least one of the members.

Trusted Advisor
Posts: 1,399

Re: Assigining same id by group with additional rule

[ Edited ]

Do you have SAS/OR  /ETS?  PROC OPTNET, which is in sas/or, has a solution for this, using the CONCOMP (connected components) statement.

 

data have;
  input var1 :$1. var2 :$1. @@;
datalines;
x a  x b  y c  y d  y e  y b  z f  z g  z h  z i
run;

data need;
  set have;
  from=cats('_',var1);
  to=cats('-',var2);
run;

proc optnet data_links=need out_nodes=ON (index=(node));
  concomp;
run;

data want ;
  set have ;
  node=cats('_',var1);
  set on key=node / unique;
  drop node;
run;

 

Data set NEED is created to prefix '_' to all var1 values, and prefix '-' to all var2 values, so that otherwise identical "node" values in var1 and var2 are not seen as the same node.  The CONCOMP statement in proc optnet assign a connected-component id - take a look at the ON dataset.

 

But I suspect you need to work through this problem using the data step.  What have you tried so far?

Esteemed Advisor
Posts: 5,626

Re: Assigining same id by group with additional rule

Your problem amounts to finding all connected components in a network (or a graph). This problem is addressed by proc optnet. If you don't have a SAS/OR licence, you can use my subgraphs macro available here to find the components. Here's how to do it:

 

data have;
input obs var1 $ var2 $; 
datalines;
1 x a 
2 x b 
3 y c 
4 y d 
5 y e 
6 y b 
7 z f 
8 z g 
9 z h 
10 z i 
11 u v
;

proc sql;
create table arcs as
select 
    a.obs as from, 
    b.obs as to
from 
    have as a inner join 
    have as b on a.obs <= b.obs and (a.var1=b.var1 or a.var2=b.var2);
quit; 

/* get the subgraphs macro from 

https://communities.sas.com/t5/SAS-Communities-Library/How-to-find-all-connected-components-in-a-graph/ta-p/231539

then include the macro definition */

%include "&sasforum.\subgraphsmacro.sas";

/* Call the macro */
%subgraphs(arcs, from=from, to=to, out=sets);

/* Rename and reorder output */
proc sql;
create table newIds as
select 
    clust as id,
    node as obs
from sets
order by id, obs;
select * from newIds;
quit;

 

 

PG
Contributor
Posts: 22

Re: Assigining same id by group with additional rule

I can't thank you enough.

After few adjustment, I could get the result that I wanted almost perfectly.

I have some missing values in var2 and I don't want those missing values to be regarded as the same value which will make them to be assiigned one signle id according to the rule nubmer 2. Instead, I want those cases to be an exception for the rule number 2.

For example, if I have a dataset like this.

 

obsvar1var2
1ax
2ay
3bx
4b 
5bz
6cu
7cv
8cw
9d 
10dm

I want a group id for obs 1-5 and another id for obs 6-8 and the other for obs 9-10.

Thank you again for the great reply you gave me before.

I tried to understand the code you gave me and adjust it as I wanted but I failed to do so.

It will be wonderful if you give me an extra help.

Thank you.

Super User
Posts: 10,850

Re: Assigining same id by group with additional rule

If you have SAS/OR . Base on @mkeintz

 

data have missing;
infile cards expandtabs truncover;
input obs	var1 $ var2 $;
if not missing(var2) then output have;
 else output missing;
drop obs;
cards;
1	a	x
2	a	y
3	b	x
4	b	 
5	b	z
6	c	u
7	c	v
8	c	w
9	d	 
10	d	m
;
run;

data need;
  set have;
  from=cats('_',var1);
  to=cats('-',var2);
run;

proc optnet data_links=need out_nodes=ON (index=(node));
  concomp;
run;

data temp ;
  set have ;
  node=cats('_',var1);
  set on key=node / unique;
  drop node;
run;
data temp1;
 set temp missing;
 by var1;
run;
proc sort data=temp1 out=temp2;
 by var1 descending concomp;
run;
data want;
 set temp2;
 by var1;
 retain new;
 if first.var1 then call missing(new);
 if not missing(concomp) then new=concomp;
 drop concomp;
run;
 
proc print;run;
Solution
‎03-12-2018 10:29 AM
Super User
Posts: 10,850

Re: Assigining same id by group with additional rule

If you do not have SAS/OR. Try my code .

 

data have ;
infile cards expandtabs truncover;
input obs	from $ to $;
drop obs;
cards;
1	a	x
2	a	y
3	b	x
4	b	 
5	b	z
6	c	u
7	c	v
8	c	w
9	d	 
10	d	m
;
run;




data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

proc sql;
create table key as
 select * from want
  where node in (select from from have);
create table final_want as
 select a.*,b.household from have as a left join key as b
  on a.from=key.node;
quit;

proc print;run;
Contributor
Posts: 22

Re: Assigining same id by group with additional rule

It worked surprisingly well.

I am extremely grateful for your help.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 208 views
  • 3 likes
  • 4 in conversation