Solved: Re: Specifying the observation as in same group

uguraltuntas67 · Posted 05-14-2020 03:31 AM

I have a table consisting of two columns (X,Y) that represent correlations between observations like below.

X Y
1 2
2 3
3 4
A B
B C

I want a create new column that represent the relation between observation. 1 become 2, 2 become 3, 3 become 4. So i wanna show this variables in same group(1,2,3,4 are belong to same group). The table should be like below.

X Y Z
1 2 Group 1
2 3 Group 1
3 4 Group 1
A B Group 2
B C Group 2

I can not sort the variables because observation dates random. I am using SAS Enterprise Guide. I need the logic or sas code.

Note: I have no additional information except this table.

If there is no clear information please notify me.

Any help would be appreciated,

Thanks,

ed_sas_member · Posted 05-14-2020 04:37 AM

Hi @uguraltuntas67

Sorry for the confusion 😊

Here is a similar thread : https://communities.sas.com/t5/SAS-Programming/Creating-a-grouper-column-from-multiple-columns-diago...

I think that @Reeza 's answer will correspond to what you're looking for:

As mentioned in his post, here is the link to this great hash solution: https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30

Wish I could be able to write such a program ☹️.

Anyway, I have tried to adapt it to your case:

data have;
	input from $ to $;
	datalines;
1 2
A B
3 4
B C
2 3
4 5
7 8
;
run;

%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');

do newClust = 1 by 1 while(not endLoop);
	set &arcs end=endLoop;
	call missing(clust); node = &from;
	if 0^=nodes.find() then nodes.add(); 
	fromClust = clust; 
	call missing(clust); node = &to;
	if 0^=nodes.find() then nodes.add(); 
	toClust = clust;
	if n(fromClust, toClust) = 0 then do;
		nodes.replace(key:&from, data:&from, data:newClust);
		nodes.replace(key:&to, data:&to, data:newClust);
	end;
	else if missing(toClust) then 
		nodes.replace(key:&to, data:&to, data:fromClust);
	else if missing(fromClust) then 
		nodes.replace(key:&from, data:&from, data:toClust);
	else if fromClust ne toClust then do;
		rc = nodeList.first();
		do while (rc = 0);
			if clust = fromClust then 
				nodes.replace(key:node, data:node, data:toClust);
			rc = nodeList.next();
		end;
	end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;


options mprint symbolgen;
%SubGraphs(have,out=want,exp=1);

proc sql;
	create table ref as
	select clust as _group, node as x
	from want;
quit;

proc sql;
	create table have2 as
	select a.*, b._group
	from have as a left join ref as b
	on a.from = b.x
	order by _group;
run;

data want;
	set have2;
	by _group;
	if first._group then group+1;
run;

In the same thread, you can also have a look at @novinosrin's solution 🙂

Hope this helps,

Best,

View solution in original post

ed_sas_member · Posted 05-14-2020 03:38 AM

Hi @uguraltuntas67

Do you mean that if X is equal to the previous value of Y, they belong to the same group?

If so, please try this:

data want;
	set have;
	_lag = lag(Y);
	if x ne _lag then group+1;
	drop _lag;
run;

Best,

uguraltuntas67 · Posted 05-14-2020 04:02 AM

Hi @ed_sas_member

Thank you for quick response. But my data is not sorted. Sorry for my misleading. My data look like as below.

X Y
1 2

A B

3 4

B C
2 3

And i wanna do this;

X Y Z
1 2 Group 1

A B Group 2

3 4 Group 1

2 3 Group 1

B C Group 2

Best,

ed_sas_member · Posted 05-14-2020 04:37 AM

Hi @uguraltuntas67

Sorry for the confusion 😊

Here is a similar thread : https://communities.sas.com/t5/SAS-Programming/Creating-a-grouper-column-from-multiple-columns-diago...

I think that @Reeza 's answer will correspond to what you're looking for:

As mentioned in his post, here is the link to this great hash solution: https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30

Wish I could be able to write such a program ☹️.

Anyway, I have tried to adapt it to your case:

data have;
	input from $ to $;
	datalines;
1 2
A B
3 4
B C
2 3
4 5
7 8
;
run;

%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');

do newClust = 1 by 1 while(not endLoop);
	set &arcs end=endLoop;
	call missing(clust); node = &from;
	if 0^=nodes.find() then nodes.add(); 
	fromClust = clust; 
	call missing(clust); node = &to;
	if 0^=nodes.find() then nodes.add(); 
	toClust = clust;
	if n(fromClust, toClust) = 0 then do;
		nodes.replace(key:&from, data:&from, data:newClust);
		nodes.replace(key:&to, data:&to, data:newClust);
	end;
	else if missing(toClust) then 
		nodes.replace(key:&to, data:&to, data:fromClust);
	else if missing(fromClust) then 
		nodes.replace(key:&from, data:&from, data:toClust);
	else if fromClust ne toClust then do;
		rc = nodeList.first();
		do while (rc = 0);
			if clust = fromClust then 
				nodes.replace(key:node, data:node, data:toClust);
			rc = nodeList.next();
		end;
	end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;


options mprint symbolgen;
%SubGraphs(have,out=want,exp=1);

proc sql;
	create table ref as
	select clust as _group, node as x
	from want;
quit;

proc sql;
	create table have2 as
	select a.*, b._group
	from have as a left join ref as b
	on a.from = b.x
	order by _group;
run;

data want;
	set have2;
	by _group;
	if first._group then group+1;
run;

In the same thread, you can also have a look at @novinosrin's solution 🙂

Hope this helps,

Best,

yabwon · Posted 05-14-2020 05:25 AM

Hi,

if we can assume that groups are linear (one parent one child, i.e 1->2 2->3 but not 1->2 1->3) then following mix of formats and arrays may help.


data have;
input X $ Y $;
cards;
1 2
2 3
A B
3 4
B C
;
run;


/* first convert your data to integers */
data first(keep = start);
  set have;
  start = x;
  output;
  start = y;
  output;
run;

proc sort data = first out = second nodupkeys;
  by start;
run;

data third;
  retain fmtname 'toInteger' type 'i'; 
  set second curobs = curobs;
  label = curobs;
run;

proc format library=work cntlin=third;
run;


/* use modified input data to work on array */
data have1;
 set have nobs=nobs end = EOF;
 x1 = input(x, toInteger.);
 y1 = input(y, toInteger.);
 keep x1 y1;
 output;
 if EOF then call symputX("N", 2*nobs);
run;


data fourth;
  array rel[&N.] _temporary_;
  retain fmtname 'toGroup' type 'i';

  /* populate array */
  do until(eof);
    set have1 end=EOF;
    rel[x1] = y1;
  end;

  /* traverse array to find "lists" of related pairs */
  LABEL = 0;
  do _N_ = 1 to dim(rel);
    if rel[_N_] > .z then
      do;
        _I_ = _N_;
        LABEL + 1;
        do while(rel[_I_] > .z);
          START = _I_;
          output;
          _T_ = _I_;
          _I_ = rel[_I_];
          rel[_T_] = .; 
        end;
      end;
  end;
  keep fmtname type LABEL START;
run;

proc format library=work cntlin=fourth;
run;

/* assign groups */
data want;
 set have nobs=nobs end = EOF;
 group = input(input(x, toInteger.), toGroup.);
 output;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

uguraltuntas67 · Posted 05-14-2020 06:16 AM

Hello again @ed_sas_member , @yabwon ,

Your both solutions seems worked fine. I am gonna see the codes in detail. If anythings come up new, I'll let you now.

Thank you for your time and contributions.

Best,

Uğur,

ed_sas_member · Posted 05-14-2020 06:20 AM

You're welcome @uguraltuntas67!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away