BookmarkSubscribeRSS Feed
jhzsquared
Calcite | Level 5

Hi all,

 

I'm at a loss on how to merge/combine rows effectively when there are multiple overlapping connections. 

Essentially explained verbally:  what I am trying to do is to take A and B and rename them by the first of the A's in the group. The problem is, A1= B1 and B2 but B2= A1 and A2. Thus, the result should just say A1= the names of B1 B2 and A2. These connections occur over multiple rows. 

 

In an example: 

I have: 

ID1_AID2_BName
11alpha
12bravo
13charlie
22bravo
23delta
34echo

 

I want:  

ID_combinedName_combined
1alpha bravo charlie bravo delta
3echo

 

thanks! I'm on SAS university edition

9 REPLIES 9
Ksharp
Super User

I don't know why A3 wouldn't be include ,since

1	3	charlie
3	4	echo


1->3->4
jhzsquared
Calcite | Level 5

A3 is not included because the two IDs are seperate from each other. If A3 had equaled B1, B2, or B3 then it would have been included in the grouping. However, since B4 is a new ID, it is not included. 

 

Another example would be: 

ID1_AID2_BName
redapplealpha
redstrawberrybravo
greenapplecharlie
greenstrawberrybravo
greenbananadelta
purpleeggplantecho

 

becomes: 

ID_combinedName_combined
redalpha bravo charlie bravo delta
purpleecho
art297
Opal | Level 21

There may be a more direct way, but the following appears to meet your specifications:

 

data have;
  infile cards dlm='09'x;
  input ID1_A	ID2_B	Name $;
  cards;
1	1	alpha
1	2	bravo
1	3	charlie
2	2	bravo
2	3	delta
3	4	echo
;

data need1;
  set have;
  recnum=_n_;
run;

proc sql;
  create table need2 as
    select recnum,count(*) as count
      from need1
        group by id2_b
          order by recnum
  ;
quit;

data need2 (rename=(id1_a=ID_combined));
  merge need1 need2;
  by recnum;
  retain hold_id1_a;
  if _n_ eq 1 then hold_id1_a=id1_a;
  else if count gt 1 then id1_a=hold_id1_a;
run;

proc transpose data=need2 out=want (drop=_:);
  by ID_combined;
  var Name;
run;

data want (drop=col:);
  set want;
  name_combined=catx(' ',of col:);
run;

Art, CEO, AnalystFinder.com

jhzsquared
Calcite | Level 5

That does work! Thank you!

Do you know how you could make it into kind of a decoder key, so that you would know that IDA equals the following IDBs and IDAs that were clustered into that group? For instance with your data input it would become: 

Grouped ID AB
111
122
1 3
   
art297
Opal | Level 21

I think you have to more clearly describe what you want. Might just need to create a format. However, I'd pose the question as a new thread to increase the number of people responding.

 

Art, CEO, AnalystFinder.com

mkeintz
PROC Star

This is a good task to take advantage of the default attributes of hash object (hash table).

 

The data step below declares a hash table with 2 variables (ID1_A and ID2_B), keyed on ID1_A.  Because the default mode of hash tables is to keep only one item (i.e. one row) per key (one per ID2_B in your case), and that one item defaults to the first one encountered, you can just output the hash object to a sas dataset to get what you want, namely each ID2_B and the earliest associated ID1A.

 

data _null_;

  declare hash h (dataset:'have (keep=id1_a id2_b)');

    h.definekey('id2_b');

    h.definedata(all:'Y');

  rc=h.output(dataset:'decoder');

run;

 

proc sort data=decoder;

  by id1_a id2_b;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

I think you want

 

  1. To form a group, identified by ID1_A, of all the ID2_B's that are linked to that ID1_A, but only if the ID2_B has not been already linked with an earlier ID1_A.
  2. For that group of ID2_B's, get a list of all unique names to assoicate with the group identifier (ID1_A):

 

data have;
   input ID1_A ID2_B Name :$8.;
datalines;
1 1 alpha 
1 2 bravo 
1 3 charlie 
2 2 bravo 
2 3 delta 
3 4 echo 
run;

data need (keep=id1_a name);
  set have;

  if _n_=1 then do;
    declare hash names();  /* Running table of names encountered*/
      names.definekey('name');
      names.definedone();

    declare hash b_to_a(); /* Map from ID2_B to first linked ID1_A*/
      b_to_a.definekey('id2_b');
      b_to_a.definedata('id1_a');
      b_to_a.definedone();
  end;

  rc=names.find();
  if rc=0 then delete;

  /* Only new names get to this part*/
  rc=names.add();                /* Add it to the table of names */
  rc=b_to_a.find();              /* Find previously associated ID1_a if any*/
  if rc^=0 then rc=b_to_a.add(); /*Add new ID2_B link if not already there */
run;

proc sort; by id_1a;run;

data want;
  set need;
  by id1_a;

  length name_list $200;
  retain name_list;

  if  first.id1_a then name_list=name;
  else name_list=catx(' ',name_list,name);
  if last.id1_a ;
run;

 

The idea here is to maintain a hash lookup table linking ID2_B to it's first linked ID1_A, and only that ID1_A.  Then it's easy to make a dataset of ID1_A and each corresponding name.  Sort them and compile into a list.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

It is really really not easy. After got the result (final_want table), it is easy to get the ID1_A you want .

 

data x;
   input ID1_A ID2_B Name $;
datalines;
1 1 alpha 
1 2 bravo 
1 3 charlie 
2 2 bravo 
2 3 delta 
2 5 xx
2 6 zz
3 4 echo 
4 6 yy
5 11 ee
5 22 rr
6 22 gg
6 44 uu
7 88 ii
7 99 oo
;
run;
data key;
array x{99999} _temporary_;
do i=1 by 1 until(last.id1_a);
 set x;
 by id1_a;
 x{i}=id2_b;
end;

do m=1 to i;
 from=x{m};
 do n=m+1 to i;
  to=x{n};output;
 end;
end;
keep from to;
run;

proc sort data=key out=have nodupkey;
 by from to;
run;



data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
call symputx('n',household);
stop;
run;
%put &n;
data want1;
 if _n_=1 then do;
  if 0 then set want;
  declare hash h(dataset:'want');
  h.definekey('node');
  h.definedata('household');
  h.definedone();
 end;
retain n &n;
set x;
call missing(household);
if h.find(key:id2_b) ne 0 then do;n+1;household=n;end;
keep name household;
run;
proc sort data=want1 out=want2;
 by household;
run;
data final_want;
length want $ 200;
do until(last.household);
 set want2;
 by household;
 want=catx(' ',want,name);
end;
drop household name;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 660 views
  • 1 like
  • 5 in conversation