BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
uguraltuntas67
Fluorite | Level 6

I have a table consisting of two columns (X,Y) that represent correlations between observations like below.

X Y
1 2
2 3
3 4
A B
B C

I want a create new column that represent the relation between observation. 1 become 2, 2 become 3, 3 become 4. So i wanna show this variables in same group(1,2,3,4 are belong to same group). The table should be like below.

X Y Z
1 2 Group 1
2 3 Group 1
3 4 Group 1
A B Group 2
B C Group 2

I can not sort the variables because observation dates random. I am using SAS Enterprise Guide. I need the logic or sas code.

Note: I have no additional information except this table.

If there is no clear information please notify me.

 

Any help would be appreciated,

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @uguraltuntas67 

 

Sorry for the confusion 😊

Here is a similar thread : https://communities.sas.com/t5/SAS-Programming/Creating-a-grouper-column-from-multiple-columns-diago...

I think that @Reeza 's answer will correspond to what you're looking for:

Capture d’écran 2020-05-14 à 10.24.30.png

As mentioned in his post, here is the link to this great hash solution: https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30

Wish I could be able to write such a program ☹️

Anyway, I have tried to adapt it to your case:

data have;
	input from $ to $;
	datalines;
1 2
A B
3 4
B C
2 3
4 5
7 8
;
run;

%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');

do newClust = 1 by 1 while(not endLoop);
	set &arcs end=endLoop;
	call missing(clust); node = &from;
	if 0^=nodes.find() then nodes.add(); 
	fromClust = clust; 
	call missing(clust); node = &to;
	if 0^=nodes.find() then nodes.add(); 
	toClust = clust;
	if n(fromClust, toClust) = 0 then do;
		nodes.replace(key:&from, data:&from, data:newClust);
		nodes.replace(key:&to, data:&to, data:newClust);
	end;
	else if missing(toClust) then 
		nodes.replace(key:&to, data:&to, data:fromClust);
	else if missing(fromClust) then 
		nodes.replace(key:&from, data:&from, data:toClust);
	else if fromClust ne toClust then do;
		rc = nodeList.first();
		do while (rc = 0);
			if clust = fromClust then 
				nodes.replace(key:node, data:node, data:toClust);
			rc = nodeList.next();
		end;
	end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;


options mprint symbolgen;
%SubGraphs(have,out=want,exp=1);

proc sql;
	create table ref as
	select clust as _group, node as x
	from want;
quit;

proc sql;
	create table have2 as
	select a.*, b._group
	from have as a left join ref as b
	on a.from = b.x
	order by _group;
run;

data want;
	set have2;
	by _group;
	if first._group then group+1;
run;

 

In the same thread, you can also have a look at @novinosrin's solution 🙂

 

Hope this helps,

Best,

View solution in original post

6 REPLIES 6
ed_sas_member
Meteorite | Level 14

Hi @uguraltuntas67 

 

Do you mean that if X is equal to the previous value of Y, they belong to the same group?

If so, please try this:

data want;
	set have;
	_lag = lag(Y);
	if x ne _lag then group+1;
	drop _lag;
run;

Best,

uguraltuntas67
Fluorite | Level 6

 

Hi @ed_sas_member 

 

Thank you for quick response. But my data is not sorted. Sorry for my misleading. My data look like as below.

X Y
1 2

A B

3 4

B C
2 3

And i wanna do this;

 

X Y Z
1 2 Group 1

A B Group 2

3 4 Group 1

2 3 Group 1

B C Group 2

 

Best,

ed_sas_member
Meteorite | Level 14

Hi @uguraltuntas67 

 

Sorry for the confusion 😊

Here is a similar thread : https://communities.sas.com/t5/SAS-Programming/Creating-a-grouper-column-from-multiple-columns-diago...

I think that @Reeza 's answer will correspond to what you're looking for:

Capture d’écran 2020-05-14 à 10.24.30.png

As mentioned in his post, here is the link to this great hash solution: https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30

Wish I could be able to write such a program ☹️

Anyway, I have tried to adapt it to your case:

data have;
	input from $ to $;
	datalines;
1 2
A B
3 4
B C
2 3
4 5
7 8
;
run;

%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');

do newClust = 1 by 1 while(not endLoop);
	set &arcs end=endLoop;
	call missing(clust); node = &from;
	if 0^=nodes.find() then nodes.add(); 
	fromClust = clust; 
	call missing(clust); node = &to;
	if 0^=nodes.find() then nodes.add(); 
	toClust = clust;
	if n(fromClust, toClust) = 0 then do;
		nodes.replace(key:&from, data:&from, data:newClust);
		nodes.replace(key:&to, data:&to, data:newClust);
	end;
	else if missing(toClust) then 
		nodes.replace(key:&to, data:&to, data:fromClust);
	else if missing(fromClust) then 
		nodes.replace(key:&from, data:&from, data:toClust);
	else if fromClust ne toClust then do;
		rc = nodeList.first();
		do while (rc = 0);
			if clust = fromClust then 
				nodes.replace(key:node, data:node, data:toClust);
			rc = nodeList.next();
		end;
	end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;


options mprint symbolgen;
%SubGraphs(have,out=want,exp=1);

proc sql;
	create table ref as
	select clust as _group, node as x
	from want;
quit;

proc sql;
	create table have2 as
	select a.*, b._group
	from have as a left join ref as b
	on a.from = b.x
	order by _group;
run;

data want;
	set have2;
	by _group;
	if first._group then group+1;
run;

 

In the same thread, you can also have a look at @novinosrin's solution 🙂

 

Hope this helps,

Best,

yabwon
Onyx | Level 15

Hi,

 

if we can assume that groups are linear (one parent one child, i.e 1->2 2->3 but not 1->2 1->3) then following mix of formats and arrays may help.


data have;
input X $ Y $;
cards;
1 2
2 3
A B
3 4
B C
;
run;


/* first convert your data to integers */
data first(keep = start);
  set have;
  start = x;
  output;
  start = y;
  output;
run;

proc sort data = first out = second nodupkeys;
  by start;
run;

data third;
  retain fmtname 'toInteger' type 'i'; 
  set second curobs = curobs;
  label = curobs;
run;

proc format library=work cntlin=third;
run;


/* use modified input data to work on array */
data have1;
 set have nobs=nobs end = EOF;
 x1 = input(x, toInteger.);
 y1 = input(y, toInteger.);
 keep x1 y1;
 output;
 if EOF then call symputX("N", 2*nobs);
run;


data fourth;
  array rel[&N.] _temporary_;
  retain fmtname 'toGroup' type 'i';

  /* populate array */
  do until(eof);
    set have1 end=EOF;
    rel[x1] = y1;
  end;

  /* traverse array to find "lists" of related pairs */
  LABEL = 0;
  do _N_ = 1 to dim(rel);
    if rel[_N_] > .z then
      do;
        _I_ = _N_;
        LABEL + 1;
        do while(rel[_I_] > .z);
          START = _I_;
          output;
          _T_ = _I_;
          _I_ = rel[_I_];
          rel[_T_] = .; 
        end;
      end;
  end;
  keep fmtname type LABEL START;
run;

proc format library=work cntlin=fourth;
run;

/* assign groups */
data want;
 set have nobs=nobs end = EOF;
 group = input(input(x, toInteger.), toGroup.);
 output;
run;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



uguraltuntas67
Fluorite | Level 6

Hello again @ed_sas_member , @yabwon ,

 

Your both solutions seems worked fine. I am gonna  see the codes in detail. If anythings come up new, I'll let you now.

Thank you for your time and contributions.

 

Best,

Uğur,

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1108 views
  • 3 likes
  • 3 in conversation