I have a table consisting of two columns (X,Y) that represent correlations between observations like below.
X Y
1 2
2 3
3 4
A B
B C
I want a create new column that represent the relation between observation. 1 become 2, 2 become 3, 3 become 4. So i wanna show this variables in same group(1,2,3,4 are belong to same group). The table should be like below.
X Y Z
1 2 Group 1
2 3 Group 1
3 4 Group 1
A B Group 2
B C Group 2
I can not sort the variables because observation dates random. I am using SAS Enterprise Guide. I need the logic or sas code.
Note: I have no additional information except this table.
If there is no clear information please notify me.
Any help would be appreciated,
Thanks,
Sorry for the confusion 😊
Here is a similar thread : https://communities.sas.com/t5/SAS-Programming/Creating-a-grouper-column-from-multiple-columns-diago...
I think that @Reeza 's answer will correspond to what you're looking for:
As mentioned in his post, here is the link to this great hash solution: https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30
Wish I could be able to write such a program ☹️.
Anyway, I have tried to adapt it to your case:
data have;
input from $ to $;
datalines;
1 2
A B
3 4
B C
2 3
4 5
7 8
;
run;
%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');
do newClust = 1 by 1 while(not endLoop);
set &arcs end=endLoop;
call missing(clust); node = &from;
if 0^=nodes.find() then nodes.add();
fromClust = clust;
call missing(clust); node = &to;
if 0^=nodes.find() then nodes.add();
toClust = clust;
if n(fromClust, toClust) = 0 then do;
nodes.replace(key:&from, data:&from, data:newClust);
nodes.replace(key:&to, data:&to, data:newClust);
end;
else if missing(toClust) then
nodes.replace(key:&to, data:&to, data:fromClust);
else if missing(fromClust) then
nodes.replace(key:&from, data:&from, data:toClust);
else if fromClust ne toClust then do;
rc = nodeList.first();
do while (rc = 0);
if clust = fromClust then
nodes.replace(key:node, data:node, data:toClust);
rc = nodeList.next();
end;
end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;
options mprint symbolgen;
%SubGraphs(have,out=want,exp=1);
proc sql;
create table ref as
select clust as _group, node as x
from want;
quit;
proc sql;
create table have2 as
select a.*, b._group
from have as a left join ref as b
on a.from = b.x
order by _group;
run;
data want;
set have2;
by _group;
if first._group then group+1;
run;
In the same thread, you can also have a look at @novinosrin's solution 🙂
Hope this helps,
Best,
Do you mean that if X is equal to the previous value of Y, they belong to the same group?
If so, please try this:
data want;
set have;
_lag = lag(Y);
if x ne _lag then group+1;
drop _lag;
run;
Best,
Thank you for quick response. But my data is not sorted. Sorry for my misleading. My data look like as below.
X Y
1 2
A B
3 4
B C
2 3
And i wanna do this;
X Y Z
1 2 Group 1
A B Group 2
3 4 Group 1
2 3 Group 1
B C Group 2
Best,
Sorry for the confusion 😊
Here is a similar thread : https://communities.sas.com/t5/SAS-Programming/Creating-a-grouper-column-from-multiple-columns-diago...
I think that @Reeza 's answer will correspond to what you're looking for:
As mentioned in his post, here is the link to this great hash solution: https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30
Wish I could be able to write such a program ☹️.
Anyway, I have tried to adapt it to your case:
data have;
input from $ to $;
datalines;
1 2
A B
3 4
B C
2 3
4 5
7 8
;
run;
%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');
do newClust = 1 by 1 while(not endLoop);
set &arcs end=endLoop;
call missing(clust); node = &from;
if 0^=nodes.find() then nodes.add();
fromClust = clust;
call missing(clust); node = &to;
if 0^=nodes.find() then nodes.add();
toClust = clust;
if n(fromClust, toClust) = 0 then do;
nodes.replace(key:&from, data:&from, data:newClust);
nodes.replace(key:&to, data:&to, data:newClust);
end;
else if missing(toClust) then
nodes.replace(key:&to, data:&to, data:fromClust);
else if missing(fromClust) then
nodes.replace(key:&from, data:&from, data:toClust);
else if fromClust ne toClust then do;
rc = nodeList.first();
do while (rc = 0);
if clust = fromClust then
nodes.replace(key:node, data:node, data:toClust);
rc = nodeList.next();
end;
end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;
options mprint symbolgen;
%SubGraphs(have,out=want,exp=1);
proc sql;
create table ref as
select clust as _group, node as x
from want;
quit;
proc sql;
create table have2 as
select a.*, b._group
from have as a left join ref as b
on a.from = b.x
order by _group;
run;
data want;
set have2;
by _group;
if first._group then group+1;
run;
In the same thread, you can also have a look at @novinosrin's solution 🙂
Hope this helps,
Best,
Hi,
if we can assume that groups are linear (one parent one child, i.e 1->2 2->3 but not 1->2 1->3) then following mix of formats and arrays may help.
data have;
input X $ Y $;
cards;
1 2
2 3
A B
3 4
B C
;
run;
/* first convert your data to integers */
data first(keep = start);
set have;
start = x;
output;
start = y;
output;
run;
proc sort data = first out = second nodupkeys;
by start;
run;
data third;
retain fmtname 'toInteger' type 'i';
set second curobs = curobs;
label = curobs;
run;
proc format library=work cntlin=third;
run;
/* use modified input data to work on array */
data have1;
set have nobs=nobs end = EOF;
x1 = input(x, toInteger.);
y1 = input(y, toInteger.);
keep x1 y1;
output;
if EOF then call symputX("N", 2*nobs);
run;
data fourth;
array rel[&N.] _temporary_;
retain fmtname 'toGroup' type 'i';
/* populate array */
do until(eof);
set have1 end=EOF;
rel[x1] = y1;
end;
/* traverse array to find "lists" of related pairs */
LABEL = 0;
do _N_ = 1 to dim(rel);
if rel[_N_] > .z then
do;
_I_ = _N_;
LABEL + 1;
do while(rel[_I_] > .z);
START = _I_;
output;
_T_ = _I_;
_I_ = rel[_I_];
rel[_T_] = .;
end;
end;
end;
keep fmtname type LABEL START;
run;
proc format library=work cntlin=fourth;
run;
/* assign groups */
data want;
set have nobs=nobs end = EOF;
group = input(input(x, toInteger.), toGroup.);
output;
run;
Bart
Hello again @ed_sas_member , @yabwon ,
Your both solutions seems worked fine. I am gonna see the codes in detail. If anythings come up new, I'll let you now.
Thank you for your time and contributions.
Best,
Uğur,
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.