BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
New_SAS_user76
Fluorite | Level 6

I want to use the Common Key to form a new group. This task was very complicated for me and hope anyone can help me to solve it.

If there is a common key then I want to assign a new group name and group it.

 

 

Data Source
CompanyEmployee
AADavid
AAWinnie
AATom
AADaisy
BBDavid
BBPeter
CCMary
CCPeter
CCTony
CCApple
DDLemon
DDSky
EECara
EEPeter
EE
Susan
  
  
Expected Output File Format
IDNew_Grouping
AAGP01
BBGP01
CCGP01
EEGP01
DavidGP01
WinnieGP01
TomGP01
DaisyGP01
PeterGP01
MaryGP01
TonyGP01
AppleGP01
CaraGP01
SusanGP01
DDGP02
LemonGP02
SkyGP02
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You may use my general purpose subGraphs macro (attached)

 

data have;
input Company $ Employee $;
datalines;
AA David 
AA Winnie 
AA Tom 
AA Daisy 
BB David 
BB Peter 
CC Mary 
CC Peter 
CC Tony 
CC Apple 
DD Lemon 
DD Sky 
EE Cara 
EE Peter 
EE Susan 
;

data arcs;
length from to $12;
do until(last.company);
    set have; by company notsorted;
    to = Employee;
    if not missing(from) then output;
    from = to;
    end;
keep from to;
run;

%include "&sasforum.\subGraphsmacro.sas";

%subgraphs(arcs, out=groups);

proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;

data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;

proc sort data=fullGroups out=want nodupkey; by grouping id; run;
PG

View solution in original post

10 REPLIES 10
PGStats
Opal | Level 21

You may use my general purpose subGraphs macro (attached)

 

data have;
input Company $ Employee $;
datalines;
AA David 
AA Winnie 
AA Tom 
AA Daisy 
BB David 
BB Peter 
CC Mary 
CC Peter 
CC Tony 
CC Apple 
DD Lemon 
DD Sky 
EE Cara 
EE Peter 
EE Susan 
;

data arcs;
length from to $12;
do until(last.company);
    set have; by company notsorted;
    to = Employee;
    if not missing(from) then output;
    from = to;
    end;
keep from to;
run;

%include "&sasforum.\subGraphsmacro.sas";

%subgraphs(arcs, out=groups);

proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;

data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;

proc sort data=fullGroups out=want nodupkey; by grouping id; run;
PG
New_SAS_user76
Fluorite | Level 6

Thanks for your help! You help me a lot!

But I still have one question, would you please explore! Maybe my question was not clear before: Let me explain details again, I have added Company FF to HH. If the Company is unique then I would like to assign it a new group no. but now all unique companies which group in one group.

Data Source

 
CompanyEmployee
AADavid
AAWinnie
AATom
AADaisy
BBDavid
BBPeter
CCMary
CCPeter
CCTony
CCApple
DDLemon
DDSky
EECara
EEPeter
EESusan
FFMark
GGLuck
HHLeon
  
Expected Output File Format 
IDNew_Grouping
AAGP01
BBGP01
CCGP01
EEGP01
DavidGP01
WinnieGP01
TomGP01
DaisyGP01
PeterGP01
MaryGP01
TonyGP01
AppleGP01
CaraGP01
SusanGP01
DDGP02
LemonGP02
SkyGP02
MarkGP03
LuckGP04
LeonGP05
PGStats
Opal | Level 21

Slight modif takes care of single cases

 

data have;
input Company $ Employee $;
datalines;
AA David 
AA Winnie 
AA Tom 
AA Daisy 
BB David 
BB Peter 
CC Mary 
CC Peter 
CC Tony 
CC Apple 
DD Lemon 
DD Sky 
EE Cara 
EE Peter 
EE Susan 
FF Mark 
GG Luck 
HH Leon 
;

data arcs;
length from to $12;
do i = 0 by 1 until(last.company);
    set have; by company notsorted;
    to = Employee;
    if not missing(from) then output;
    from = to;
    end;
if not i then output;
keep from to;
run;

%include "&sasforum.\subGraphsmacro.sas";

%subgraphs(arcs, out=groups);

proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;

data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;

proc sort data=fullGroups out=want nodupkey; by grouping id; run;

proc print data=want noobs; run;
                                 grouping    id

                                   GP1       AA
                                   GP1       Apple
                                   GP1       BB
                                   GP1       CC
                                   GP1       Cara
                                   GP1       Daisy
                                   GP1       David
                                   GP1       EE
                                   GP1       Mary
                                   GP1       Peter
                                   GP1       Susan
                                   GP1       Tom
                                   GP1       Tony
                                   GP1       Winnie
                                   GP11      FF
                                   GP11      Mark
                                   GP12      GG
                                   GP12      Luck
                                   GP13      HH
                                   GP13      Leon
                                   GP8       DD
                                   GP8       Lemon
                                   GP8       Sky
PG
New_SAS_user76
Fluorite | Level 6

When I run the below program then I found the record which is unique will be disappear!

I want to keep all the record even unique or duplicate! How to amend it? Thanks!

 

data arcs;
length from to $12;
do until(last.company);
    set have; by company notsorted;
    to = Employee;
    if not missing(from) then output;
    from = to;
    end;
keep from to;
run;

Ksharp
Super User

Hash Table could get you faster.

 

 

data have;
input from $ to $;
datalines;
AA David 
AA Winnie 
AA Tom 
AA Daisy 
BB David 
BB Peter 
CC Mary 
CC Peter 
CC Tony 
CC Apple 
DD Lemon 
DD Sky 
EE Cara 
EE Peter 
EE Susan 
;

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
New_SAS_user76
Fluorite | Level 6

Hi Thanks for your solution!

But the output was not my expected. Maybe my question was not clear!

 

 

 

 

Ksharp
Super User

OK. The following is what you are looking for ?

 

 

data have;
input from $ to $;
datalines;
AA	David
AA	Winnie
AA	Tom
AA	Daisy
BB	David
BB	Peter
CC	Mary
CC	Peter
CC	Tony
CC	Apple
DD	Lemon
DD	Sky
EE	Cara
EE	Peter
EE	Susan
FF	Mark
GG	Luck
HH	Leon
;

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

proc sql;
create table unique as
 select * from have
  group by from
   having count(*)=1;

create table final_want as
 select *
  from want where node not in (select from from unique);
quit;
Ksharp
Super User

OK. This is what you are looking for ?

 

 

data have;
input from $ to $;
datalines;
AA	David
AA	Winnie
AA	Tom
AA	Daisy
BB	David
BB	Peter
CC	Mary
CC	Peter
CC	Tony
CC	Apple
DD	Lemon
DD	Sky
EE	Cara
EE	Peter
EE	Susan
FF	Mark
GG	Luck
HH	Leon
;

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

proc sql;
create table unique as
 select * from have
  group by from
   having count(*)=1;

create table final_want as
 select *
  from want where node not in (select from from unique);
quit;
New_SAS_user76
Fluorite | Level 6

Sorry I can't get the output! Would you please tell me which variable should I define "Company" and "Employee"?

 

 

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;

Ksharp
Super User
Rename your variables into FROM and TO. 
And running the code after it.

data have(rename=(Company=from  Employee=to));
input Company $ Employee $;
datalines;
AA	David
AA	Winnie


hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2824 views
  • 1 like
  • 3 in conversation