I want to use the Common Key to form a new group. This task was very complicated for me and hope anyone can help me to solve it.
If there is a common key then I want to assign a new group name and group it.
| Data Source | ||
| Company | Employee | |
| AA | David | |
| AA | Winnie | |
| AA | Tom | |
| AA | Daisy | |
| BB | David | |
| BB | Peter | |
| CC | Mary | |
| CC | Peter | |
| CC | Tony | |
| CC | Apple | |
| DD | Lemon | |
| DD | Sky | |
| EE | Cara | |
| EE | Peter | |
| Susan | |
| Expected Output File Format | ||
| ID | New_Grouping | |
| AA | GP01 | |
| BB | GP01 | |
| CC | GP01 | |
| EE | GP01 | |
| David | GP01 | |
| Winnie | GP01 | |
| Tom | GP01 | |
| Daisy | GP01 | |
| Peter | GP01 | |
| Mary | GP01 | |
| Tony | GP01 | |
| Apple | GP01 | |
| Cara | GP01 | |
| Susan | GP01 | |
| DD | GP02 | |
| Lemon | GP02 | |
| Sky | GP02 |
You may use my general purpose subGraphs macro (attached)
data have;
input Company $ Employee $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
;
data arcs;
length from to $12;
do until(last.company);
set have; by company notsorted;
to = Employee;
if not missing(from) then output;
from = to;
end;
keep from to;
run;
%include "&sasforum.\subGraphsmacro.sas";
%subgraphs(arcs, out=groups);
proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;
data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;
proc sort data=fullGroups out=want nodupkey; by grouping id; run;
You may use my general purpose subGraphs macro (attached)
data have;
input Company $ Employee $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
;
data arcs;
length from to $12;
do until(last.company);
set have; by company notsorted;
to = Employee;
if not missing(from) then output;
from = to;
end;
keep from to;
run;
%include "&sasforum.\subGraphsmacro.sas";
%subgraphs(arcs, out=groups);
proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;
data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;
proc sort data=fullGroups out=want nodupkey; by grouping id; run;
Thanks for your help! You help me a lot!
But I still have one question, would you please explore! Maybe my question was not clear before: Let me explain details again, I have added Company FF to HH. If the Company is unique then I would like to assign it a new group no. but now all unique companies which group in one group.
Data Source | |
| Company | Employee |
| AA | David |
| AA | Winnie |
| AA | Tom |
| AA | Daisy |
| BB | David |
| BB | Peter |
| CC | Mary |
| CC | Peter |
| CC | Tony |
| CC | Apple |
| DD | Lemon |
| DD | Sky |
| EE | Cara |
| EE | Peter |
| EE | Susan |
| FF | Mark |
| GG | Luck |
| HH | Leon |
| Expected Output File Format | |
| ID | New_Grouping |
| AA | GP01 |
| BB | GP01 |
| CC | GP01 |
| EE | GP01 |
| David | GP01 |
| Winnie | GP01 |
| Tom | GP01 |
| Daisy | GP01 |
| Peter | GP01 |
| Mary | GP01 |
| Tony | GP01 |
| Apple | GP01 |
| Cara | GP01 |
| Susan | GP01 |
| DD | GP02 |
| Lemon | GP02 |
| Sky | GP02 |
| Mark | GP03 |
| Luck | GP04 |
| Leon | GP05 |
Slight modif takes care of single cases
data have;
input Company $ Employee $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
FF Mark
GG Luck
HH Leon
;
data arcs;
length from to $12;
do i = 0 by 1 until(last.company);
set have; by company notsorted;
to = Employee;
if not missing(from) then output;
from = to;
end;
if not i then output;
keep from to;
run;
%include "&sasforum.\subGraphsmacro.sas";
%subgraphs(arcs, out=groups);
proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;
data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;
proc sort data=fullGroups out=want nodupkey; by grouping id; run;
proc print data=want noobs; run;
grouping id
GP1 AA
GP1 Apple
GP1 BB
GP1 CC
GP1 Cara
GP1 Daisy
GP1 David
GP1 EE
GP1 Mary
GP1 Peter
GP1 Susan
GP1 Tom
GP1 Tony
GP1 Winnie
GP11 FF
GP11 Mark
GP12 GG
GP12 Luck
GP13 HH
GP13 Leon
GP8 DD
GP8 Lemon
GP8 Sky
When I run the below program then I found the record which is unique will be disappear!
I want to keep all the record even unique or duplicate! How to amend it? Thanks!
data arcs;
length from to $12;
do until(last.company);
set have; by company notsorted;
to = Employee;
if not missing(from) then output;
from = to;
end;
keep from to;
run;
Hash Table could get you faster.
data have;
input from $ to $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
Hi Thanks for your solution!
But the output was not my expected. Maybe my question was not clear!
OK. The following is what you are looking for ?
data have;
input from $ to $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
FF Mark
GG Luck
HH Leon
;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
proc sql;
create table unique as
select * from have
group by from
having count(*)=1;
create table final_want as
select *
from want where node not in (select from from unique);
quit;
OK. This is what you are looking for ?
data have;
input from $ to $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
FF Mark
GG Luck
HH Leon
;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
proc sql;
create table unique as
select * from have
group by from
having count(*)=1;
create table final_want as
select *
from want where node not in (select from from unique);
quit;
Sorry I can't get the output! Would you please tell me which variable should I define "Company" and "Employee"?
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
Rename your variables into FROM and TO. And running the code after it. data have(rename=(Company=from Employee=to)); input Company $ Employee $; datalines; AA David AA Winnie
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.