BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sejin
Obsidian | Level 7

Hello. I have a trouble in assiging same id by group.

 

I have a dataset that looks like this.

 

obsvar1var2
1xa
2xb
3yc
4yd
5ye
6yb
7zf
8zg
9zh
10zi

 

I want to assign codes based on the var1. So obs 1, 2 should be assigned to the same id because they share "x" value on var1.

However, I also have another rule which is assigning same id if observations share same value of var2. For example, following this rule, obs 1-6 should be assigned same id because obs 6 has "b" on var2 which is same as that of obs2.

 

Could anybody help me find a code to solve the problem?

Thanks for reading the thread.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

If you do not have SAS/OR. Try my code .

 

data have ;
infile cards expandtabs truncover;
input obs	from $ to $;
drop obs;
cards;
1	a	x
2	a	y
3	b	x
4	b	 
5	b	z
6	c	u
7	c	v
8	c	w
9	d	 
10	d	m
;
run;




data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

proc sql;
create table key as
 select * from want
  where node in (select from from have);
create table final_want as
 select a.*,b.household from have as a left join key as b
  on a.from=key.node;
quit;

proc print;run;

View solution in original post

8 REPLIES 8
mkeintz
PROC Star

You have stated an example of a rule that says:

  1. obs 1 and 2 get ID=1 since they both have var1=X
  2. obs 6 gets ID=1 since it shares var2=b with obs 2

then does the rule also imply that obs 3,4, and 5 also get ID=1 since they share var1=Y with obs 6?

 

How recursive is your linking rule?  Is an ID group comprised of all obs that share either var1 or var2 with at least one other group member?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Sejin
Obsidian | Level 7

The way you understood is absoulely true. Rule 1 and 2 imply that obs 1 through 6 should be assigned the same code.

Also, you can summarize that members of a group that has been assigned the same code share either same var1 or var2 wih at least one of the members.

mkeintz
PROC Star

Do you have SAS/OR  /ETS?  PROC OPTNET, which is in sas/or, has a solution for this, using the CONCOMP (connected components) statement.

 

data have;
  input var1 :$1. var2 :$1. @@;
datalines;
x a  x b  y c  y d  y e  y b  z f  z g  z h  z i
run;

data need;
  set have;
  from=cats('_',var1);
  to=cats('-',var2);
run;

proc optnet data_links=need out_nodes=ON (index=(node));
  concomp;
run;

data want ;
  set have ;
  node=cats('_',var1);
  set on key=node / unique;
  drop node;
run;

 

Data set NEED is created to prefix '_' to all var1 values, and prefix '-' to all var2 values, so that otherwise identical "node" values in var1 and var2 are not seen as the same node.  The CONCOMP statement in proc optnet assign a connected-component id - take a look at the ON dataset.

 

But I suspect you need to work through this problem using the data step.  What have you tried so far?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

Your problem amounts to finding all connected components in a network (or a graph). This problem is addressed by proc optnet. If you don't have a SAS/OR licence, you can use my subgraphs macro available here to find the components. Here's how to do it:

 

data have;
input obs var1 $ var2 $; 
datalines;
1 x a 
2 x b 
3 y c 
4 y d 
5 y e 
6 y b 
7 z f 
8 z g 
9 z h 
10 z i 
11 u v
;

proc sql;
create table arcs as
select 
    a.obs as from, 
    b.obs as to
from 
    have as a inner join 
    have as b on a.obs <= b.obs and (a.var1=b.var1 or a.var2=b.var2);
quit; 

/* get the subgraphs macro from 

https://communities.sas.com/t5/SAS-Communities-Library/How-to-find-all-connected-components-in-a-graph/ta-p/231539

then include the macro definition */

%include "&sasforum.\subgraphsmacro.sas";

/* Call the macro */
%subgraphs(arcs, from=from, to=to, out=sets);

/* Rename and reorder output */
proc sql;
create table newIds as
select 
    clust as id,
    node as obs
from sets
order by id, obs;
select * from newIds;
quit;

 

 

PG
Sejin
Obsidian | Level 7

I can't thank you enough.

After few adjustment, I could get the result that I wanted almost perfectly.

I have some missing values in var2 and I don't want those missing values to be regarded as the same value which will make them to be assiigned one signle id according to the rule nubmer 2. Instead, I want those cases to be an exception for the rule number 2.

For example, if I have a dataset like this.

 

obsvar1var2
1ax
2ay
3bx
4b 
5bz
6cu
7cv
8cw
9d 
10dm

I want a group id for obs 1-5 and another id for obs 6-8 and the other for obs 9-10.

Thank you again for the great reply you gave me before.

I tried to understand the code you gave me and adjust it as I wanted but I failed to do so.

It will be wonderful if you give me an extra help.

Thank you.

Ksharp
Super User

If you have SAS/OR . Base on @mkeintz

 

data have missing;
infile cards expandtabs truncover;
input obs	var1 $ var2 $;
if not missing(var2) then output have;
 else output missing;
drop obs;
cards;
1	a	x
2	a	y
3	b	x
4	b	 
5	b	z
6	c	u
7	c	v
8	c	w
9	d	 
10	d	m
;
run;

data need;
  set have;
  from=cats('_',var1);
  to=cats('-',var2);
run;

proc optnet data_links=need out_nodes=ON (index=(node));
  concomp;
run;

data temp ;
  set have ;
  node=cats('_',var1);
  set on key=node / unique;
  drop node;
run;
data temp1;
 set temp missing;
 by var1;
run;
proc sort data=temp1 out=temp2;
 by var1 descending concomp;
run;
data want;
 set temp2;
 by var1;
 retain new;
 if first.var1 then call missing(new);
 if not missing(concomp) then new=concomp;
 drop concomp;
run;
 
proc print;run;
Ksharp
Super User

If you do not have SAS/OR. Try my code .

 

data have ;
infile cards expandtabs truncover;
input obs	from $ to $;
drop obs;
cards;
1	a	x
2	a	y
3	b	x
4	b	 
5	b	z
6	c	u
7	c	v
8	c	w
9	d	 
10	d	m
;
run;




data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

proc sql;
create table key as
 select * from want
  where node in (select from from have);
create table final_want as
 select a.*,b.household from have as a left join key as b
  on a.from=key.node;
quit;

proc print;run;
Sejin
Obsidian | Level 7

It worked surprisingly well.

I am extremely grateful for your help.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1018 views
  • 3 likes
  • 4 in conversation