I want to use the Common Key to form a new group. This task was very complicated for me and hope anyone can help me to solve it.
If there is a common key then I want to assign a new group name and group it.
Data Source | ||
Company | Employee | |
AA | David | |
AA | Winnie | |
AA | Tom | |
AA | Daisy | |
BB | David | |
BB | Peter | |
CC | Mary | |
CC | Peter | |
CC | Tony | |
CC | Apple | |
DD | Lemon | |
DD | Sky | |
EE | Cara | |
EE | Peter | |
| Susan | |
Expected Output File Format | ||
ID | New_Grouping | |
AA | GP01 | |
BB | GP01 | |
CC | GP01 | |
EE | GP01 | |
David | GP01 | |
Winnie | GP01 | |
Tom | GP01 | |
Daisy | GP01 | |
Peter | GP01 | |
Mary | GP01 | |
Tony | GP01 | |
Apple | GP01 | |
Cara | GP01 | |
Susan | GP01 | |
DD | GP02 | |
Lemon | GP02 | |
Sky | GP02 |
You may use my general purpose subGraphs macro (attached)
data have;
input Company $ Employee $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
;
data arcs;
length from to $12;
do until(last.company);
set have; by company notsorted;
to = Employee;
if not missing(from) then output;
from = to;
end;
keep from to;
run;
%include "&sasforum.\subGraphsmacro.sas";
%subgraphs(arcs, out=groups);
proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;
data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;
proc sort data=fullGroups out=want nodupkey; by grouping id; run;
You may use my general purpose subGraphs macro (attached)
data have;
input Company $ Employee $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
;
data arcs;
length from to $12;
do until(last.company);
set have; by company notsorted;
to = Employee;
if not missing(from) then output;
from = to;
end;
keep from to;
run;
%include "&sasforum.\subGraphsmacro.sas";
%subgraphs(arcs, out=groups);
proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;
data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;
proc sort data=fullGroups out=want nodupkey; by grouping id; run;
Thanks for your help! You help me a lot!
But I still have one question, would you please explore! Maybe my question was not clear before: Let me explain details again, I have added Company FF to HH. If the Company is unique then I would like to assign it a new group no. but now all unique companies which group in one group.
Data Source | |
Company | Employee |
AA | David |
AA | Winnie |
AA | Tom |
AA | Daisy |
BB | David |
BB | Peter |
CC | Mary |
CC | Peter |
CC | Tony |
CC | Apple |
DD | Lemon |
DD | Sky |
EE | Cara |
EE | Peter |
EE | Susan |
FF | Mark |
GG | Luck |
HH | Leon |
Expected Output File Format | |
ID | New_Grouping |
AA | GP01 |
BB | GP01 |
CC | GP01 |
EE | GP01 |
David | GP01 |
Winnie | GP01 |
Tom | GP01 |
Daisy | GP01 |
Peter | GP01 |
Mary | GP01 |
Tony | GP01 |
Apple | GP01 |
Cara | GP01 |
Susan | GP01 |
DD | GP02 |
Lemon | GP02 |
Sky | GP02 |
Mark | GP03 |
Luck | GP04 |
Leon | GP05 |
Slight modif takes care of single cases
data have;
input Company $ Employee $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
FF Mark
GG Luck
HH Leon
;
data arcs;
length from to $12;
do i = 0 by 1 until(last.company);
set have; by company notsorted;
to = Employee;
if not missing(from) then output;
from = to;
end;
if not i then output;
keep from to;
run;
%include "&sasforum.\subGraphsmacro.sas";
%subgraphs(arcs, out=groups);
proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;
data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;
proc sort data=fullGroups out=want nodupkey; by grouping id; run;
proc print data=want noobs; run;
grouping id GP1 AA GP1 Apple GP1 BB GP1 CC GP1 Cara GP1 Daisy GP1 David GP1 EE GP1 Mary GP1 Peter GP1 Susan GP1 Tom GP1 Tony GP1 Winnie GP11 FF GP11 Mark GP12 GG GP12 Luck GP13 HH GP13 Leon GP8 DD GP8 Lemon GP8 Sky
When I run the below program then I found the record which is unique will be disappear!
I want to keep all the record even unique or duplicate! How to amend it? Thanks!
data arcs;
length from to $12;
do until(last.company);
set have; by company notsorted;
to = Employee;
if not missing(from) then output;
from = to;
end;
keep from to;
run;
Hash Table could get you faster.
data have;
input from $ to $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
Hi Thanks for your solution!
But the output was not my expected. Maybe my question was not clear!
OK. The following is what you are looking for ?
data have;
input from $ to $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
FF Mark
GG Luck
HH Leon
;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
proc sql;
create table unique as
select * from have
group by from
having count(*)=1;
create table final_want as
select *
from want where node not in (select from from unique);
quit;
OK. This is what you are looking for ?
data have;
input from $ to $;
datalines;
AA David
AA Winnie
AA Tom
AA Daisy
BB David
BB Peter
CC Mary
CC Peter
CC Tony
CC Apple
DD Lemon
DD Sky
EE Cara
EE Peter
EE Susan
FF Mark
GG Luck
HH Leon
;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
proc sql;
create table unique as
select * from have
group by from
having count(*)=1;
create table final_want as
select *
from want where node not in (select from from unique);
quit;
Sorry I can't get the output! Would you please tell me which variable should I define "Company" and "Employee"?
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
Rename your variables into FROM and TO. And running the code after it. data have(rename=(Company=from Employee=to)); input Company $ Employee $; datalines; AA David AA Winnie
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.