Hi all,
I'm at a loss on how to merge/combine rows effectively when there are multiple overlapping connections.
Essentially explained verbally: what I am trying to do is to take A and B and rename them by the first of the A's in the group. The problem is, A1= B1 and B2 but B2= A1 and A2. Thus, the result should just say A1= the names of B1 B2 and A2. These connections occur over multiple rows.
In an example:
I have:
ID1_A | ID2_B | Name |
1 | 1 | alpha |
1 | 2 | bravo |
1 | 3 | charlie |
2 | 2 | bravo |
2 | 3 | delta |
3 | 4 | echo |
I want:
ID_combined | Name_combined |
1 | alpha bravo charlie bravo delta |
3 | echo |
thanks! I'm on SAS university edition
What constitutes a group? At the moment I have no idea why you combine the first 5 observations, but start a new group with observation 6.
I don't know why A3 wouldn't be include ,since 1 3 charlie 3 4 echo 1->3->4
A3 is not included because the two IDs are seperate from each other. If A3 had equaled B1, B2, or B3 then it would have been included in the grouping. However, since B4 is a new ID, it is not included.
Another example would be:
ID1_A | ID2_B | Name |
red | apple | alpha |
red | strawberry | bravo |
green | apple | charlie |
green | strawberry | bravo |
green | banana | delta |
purple | eggplant | echo |
becomes:
ID_combined | Name_combined |
red | alpha bravo charlie bravo delta |
purple | echo |
There may be a more direct way, but the following appears to meet your specifications:
data have; infile cards dlm='09'x; input ID1_A ID2_B Name $; cards; 1 1 alpha 1 2 bravo 1 3 charlie 2 2 bravo 2 3 delta 3 4 echo ; data need1; set have; recnum=_n_; run; proc sql; create table need2 as select recnum,count(*) as count from need1 group by id2_b order by recnum ; quit; data need2 (rename=(id1_a=ID_combined)); merge need1 need2; by recnum; retain hold_id1_a; if _n_ eq 1 then hold_id1_a=id1_a; else if count gt 1 then id1_a=hold_id1_a; run; proc transpose data=need2 out=want (drop=_:); by ID_combined; var Name; run; data want (drop=col:); set want; name_combined=catx(' ',of col:); run;
Art, CEO, AnalystFinder.com
That does work! Thank you!
Do you know how you could make it into kind of a decoder key, so that you would know that IDA equals the following IDBs and IDAs that were clustered into that group? For instance with your data input it would become:
Grouped ID | A | B |
1 | 1 | 1 |
1 | 2 | 2 |
1 | 3 | |
I think you have to more clearly describe what you want. Might just need to create a format. However, I'd pose the question as a new thread to increase the number of people responding.
Art, CEO, AnalystFinder.com
This is a good task to take advantage of the default attributes of hash object (hash table).
The data step below declares a hash table with 2 variables (ID1_A and ID2_B), keyed on ID1_A. Because the default mode of hash tables is to keep only one item (i.e. one row) per key (one per ID2_B in your case), and that one item defaults to the first one encountered, you can just output the hash object to a sas dataset to get what you want, namely each ID2_B and the earliest associated ID1A.
data _null_;
declare hash h (dataset:'have (keep=id1_a id2_b)');
h.definekey('id2_b');
h.definedata(all:'Y');
rc=h.output(dataset:'decoder');
run;
proc sort data=decoder;
by id1_a id2_b;
run;
I think you want
data have;
input ID1_A ID2_B Name :$8.;
datalines;
1 1 alpha
1 2 bravo
1 3 charlie
2 2 bravo
2 3 delta
3 4 echo
run;
data need (keep=id1_a name);
set have;
if _n_=1 then do;
declare hash names(); /* Running table of names encountered*/
names.definekey('name');
names.definedone();
declare hash b_to_a(); /* Map from ID2_B to first linked ID1_A*/
b_to_a.definekey('id2_b');
b_to_a.definedata('id1_a');
b_to_a.definedone();
end;
rc=names.find();
if rc=0 then delete;
/* Only new names get to this part*/
rc=names.add(); /* Add it to the table of names */
rc=b_to_a.find(); /* Find previously associated ID1_a if any*/
if rc^=0 then rc=b_to_a.add(); /*Add new ID2_B link if not already there */
run;
proc sort; by id_1a;run;
data want;
set need;
by id1_a;
length name_list $200;
retain name_list;
if first.id1_a then name_list=name;
else name_list=catx(' ',name_list,name);
if last.id1_a ;
run;
The idea here is to maintain a hash lookup table linking ID2_B to it's first linked ID1_A, and only that ID1_A. Then it's easy to make a dataset of ID1_A and each corresponding name. Sort them and compile into a list.
It is really really not easy. After got the result (final_want table), it is easy to get the ID1_A you want .
data x;
input ID1_A ID2_B Name $;
datalines;
1 1 alpha
1 2 bravo
1 3 charlie
2 2 bravo
2 3 delta
2 5 xx
2 6 zz
3 4 echo
4 6 yy
5 11 ee
5 22 rr
6 22 gg
6 44 uu
7 88 ii
7 99 oo
;
run;
data key;
array x{99999} _temporary_;
do i=1 by 1 until(last.id1_a);
set x;
by id1_a;
x{i}=id2_b;
end;
do m=1 to i;
from=x{m};
do n=m+1 to i;
to=x{n};output;
end;
end;
keep from to;
run;
proc sort data=key out=have nodupkey;
by from to;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
call symputx('n',household);
stop;
run;
%put &n;
data want1;
if _n_=1 then do;
if 0 then set want;
declare hash h(dataset:'want');
h.definekey('node');
h.definedata('household');
h.definedone();
end;
retain n &n;
set x;
call missing(household);
if h.find(key:id2_b) ne 0 then do;n+1;household=n;end;
keep name household;
run;
proc sort data=want1 out=want2;
by household;
run;
data final_want;
length want $ 200;
do until(last.household);
set want2;
by household;
want=catx(' ',want,name);
end;
drop household name;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.