I have a table consisting of two columns (X,Y) that represent correlations between observations like below.
X Y
1 2
2 3
3 4
A B
B C
I want a create new column that represent the relation between observation. 1 become 2, 2 become 3, 3 become 4. So i wanna show this variables in same group(1,2,3,4 are belong to same group). The table should be like below.
X Y Z
1 2 Group 1
2 3 Group 1
3 4 Group 1
A B Group 2
B C Group 2
I can not sort the variables because observation dates random. I am using SAS Enterprise Guide. I need the logic or sas code.
Note: I have no additional information except this table.
If there is no clear information please notify me.
Any help would be appreciated,
Thanks,
Sorry for the confusion 😊
Here is a similar thread : https://communities.sas.com/t5/SAS-Programming/Creating-a-grouper-column-from-multiple-columns-diago...
I think that @Reeza 's answer will correspond to what you're looking for:
As mentioned in his post, here is the link to this great hash solution: https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30
Wish I could be able to write such a program ☹️.
Anyway, I have tried to adapt it to your case:
data have;
input from $ to $;
datalines;
1 2
A B
3 4
B C
2 3
4 5
7 8
;
run;
%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');
do newClust = 1 by 1 while(not endLoop);
set &arcs end=endLoop;
call missing(clust); node = &from;
if 0^=nodes.find() then nodes.add();
fromClust = clust;
call missing(clust); node = &to;
if 0^=nodes.find() then nodes.add();
toClust = clust;
if n(fromClust, toClust) = 0 then do;
nodes.replace(key:&from, data:&from, data:newClust);
nodes.replace(key:&to, data:&to, data:newClust);
end;
else if missing(toClust) then
nodes.replace(key:&to, data:&to, data:fromClust);
else if missing(fromClust) then
nodes.replace(key:&from, data:&from, data:toClust);
else if fromClust ne toClust then do;
rc = nodeList.first();
do while (rc = 0);
if clust = fromClust then
nodes.replace(key:node, data:node, data:toClust);
rc = nodeList.next();
end;
end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;
options mprint symbolgen;
%SubGraphs(have,out=want,exp=1);
proc sql;
create table ref as
select clust as _group, node as x
from want;
quit;
proc sql;
create table have2 as
select a.*, b._group
from have as a left join ref as b
on a.from = b.x
order by _group;
run;
data want;
set have2;
by _group;
if first._group then group+1;
run;
In the same thread, you can also have a look at @novinosrin's solution 🙂
Hope this helps,
Best,
Do you mean that if X is equal to the previous value of Y, they belong to the same group?
If so, please try this:
data want;
set have;
_lag = lag(Y);
if x ne _lag then group+1;
drop _lag;
run;
Best,
Thank you for quick response. But my data is not sorted. Sorry for my misleading. My data look like as below.
X Y
1 2
A B
3 4
B C
2 3
And i wanna do this;
X Y Z
1 2 Group 1
A B Group 2
3 4 Group 1
2 3 Group 1
B C Group 2
Best,
Sorry for the confusion 😊
Here is a similar thread : https://communities.sas.com/t5/SAS-Programming/Creating-a-grouper-column-from-multiple-columns-diago...
I think that @Reeza 's answer will correspond to what you're looking for:
As mentioned in his post, here is the link to this great hash solution: https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30
Wish I could be able to write such a program ☹️.
Anyway, I have tried to adapt it to your case:
data have;
input from $ to $;
datalines;
1 2
A B
3 4
B C
2 3
4 5
7 8
;
run;
%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');
do newClust = 1 by 1 while(not endLoop);
set &arcs end=endLoop;
call missing(clust); node = &from;
if 0^=nodes.find() then nodes.add();
fromClust = clust;
call missing(clust); node = &to;
if 0^=nodes.find() then nodes.add();
toClust = clust;
if n(fromClust, toClust) = 0 then do;
nodes.replace(key:&from, data:&from, data:newClust);
nodes.replace(key:&to, data:&to, data:newClust);
end;
else if missing(toClust) then
nodes.replace(key:&to, data:&to, data:fromClust);
else if missing(fromClust) then
nodes.replace(key:&from, data:&from, data:toClust);
else if fromClust ne toClust then do;
rc = nodeList.first();
do while (rc = 0);
if clust = fromClust then
nodes.replace(key:node, data:node, data:toClust);
rc = nodeList.next();
end;
end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;
options mprint symbolgen;
%SubGraphs(have,out=want,exp=1);
proc sql;
create table ref as
select clust as _group, node as x
from want;
quit;
proc sql;
create table have2 as
select a.*, b._group
from have as a left join ref as b
on a.from = b.x
order by _group;
run;
data want;
set have2;
by _group;
if first._group then group+1;
run;
In the same thread, you can also have a look at @novinosrin's solution 🙂
Hope this helps,
Best,
Hi,
if we can assume that groups are linear (one parent one child, i.e 1->2 2->3 but not 1->2 1->3) then following mix of formats and arrays may help.
data have;
input X $ Y $;
cards;
1 2
2 3
A B
3 4
B C
;
run;
/* first convert your data to integers */
data first(keep = start);
set have;
start = x;
output;
start = y;
output;
run;
proc sort data = first out = second nodupkeys;
by start;
run;
data third;
retain fmtname 'toInteger' type 'i';
set second curobs = curobs;
label = curobs;
run;
proc format library=work cntlin=third;
run;
/* use modified input data to work on array */
data have1;
set have nobs=nobs end = EOF;
x1 = input(x, toInteger.);
y1 = input(y, toInteger.);
keep x1 y1;
output;
if EOF then call symputX("N", 2*nobs);
run;
data fourth;
array rel[&N.] _temporary_;
retain fmtname 'toGroup' type 'i';
/* populate array */
do until(eof);
set have1 end=EOF;
rel[x1] = y1;
end;
/* traverse array to find "lists" of related pairs */
LABEL = 0;
do _N_ = 1 to dim(rel);
if rel[_N_] > .z then
do;
_I_ = _N_;
LABEL + 1;
do while(rel[_I_] > .z);
START = _I_;
output;
_T_ = _I_;
_I_ = rel[_I_];
rel[_T_] = .;
end;
end;
end;
keep fmtname type LABEL START;
run;
proc format library=work cntlin=fourth;
run;
/* assign groups */
data want;
set have nobs=nobs end = EOF;
group = input(input(x, toInteger.), toGroup.);
output;
run;
Bart
Hello again @ed_sas_member , @yabwon ,
Your both solutions seems worked fine. I am gonna see the codes in detail. If anythings come up new, I'll let you now.
Thank you for your time and contributions.
Best,
Uğur,
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.