Solved: Re-mapping

New_SAS_user76 · Posted 09-19-2018 11:45 PM

I want to use the Common Key to form a new group. This task was very complicated for me and hope anyone can help me to solve it.

If there is a common key then I want to assign a new group name and group it.

Data Source

Company

Employee

AA

David

AA

Winnie

AA

Tom

AA

Daisy

BB

David

BB

Peter

CC

Mary

CC

Peter

CC

Tony

CC

Apple

DD

Lemon

DD

Sky

EE

Cara

EE

Peter

EE

Susan

Expected Output File Format

ID

New_Grouping

AA

GP01

BB

GP01

CC

GP01

EE

GP01

David

GP01

Winnie

GP01

Tom

GP01

Daisy

GP01

Peter

GP01

Mary

GP01

Tony

GP01

Apple

GP01

Cara

GP01

Susan

GP01

DD

GP02

Lemon

GP02

Sky

GP02

PGStats · Posted 09-20-2018 12:40 AM

You may use my general purpose subGraphs macro (attached)

data have;
input Company $ Employee $;
datalines;
AA David 
AA Winnie 
AA Tom 
AA Daisy 
BB David 
BB Peter 
CC Mary 
CC Peter 
CC Tony 
CC Apple 
DD Lemon 
DD Sky 
EE Cara 
EE Peter 
EE Susan 
;

data arcs;
length from to $12;
do until(last.company);
    set have; by company notsorted;
    to = Employee;
    if not missing(from) then output;
    from = to;
    end;
keep from to;
run;

%include "&sasforum.\subGraphsmacro.sas";

%subgraphs(arcs, out=groups);

proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;

data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;

proc sort data=fullGroups out=want nodupkey; by grouping id; run;

PG

View solution in original post

PGStats · Posted 09-20-2018 12:40 AM

You may use my general purpose subGraphs macro (attached)

data have;
input Company $ Employee $;
datalines;
AA David 
AA Winnie 
AA Tom 
AA Daisy 
BB David 
BB Peter 
CC Mary 
CC Peter 
CC Tony 
CC Apple 
DD Lemon 
DD Sky 
EE Cara 
EE Peter 
EE Susan 
;

data arcs;
length from to $12;
do until(last.company);
    set have; by company notsorted;
    to = Employee;
    if not missing(from) then output;
    from = to;
    end;
keep from to;
run;

%include "&sasforum.\subGraphsmacro.sas";

%subgraphs(arcs, out=groups);

proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;

data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;

proc sort data=fullGroups out=want nodupkey; by grouping id; run;

PG

New_SAS_user76 · Posted 09-26-2018 02:53 AM

Thanks for your help! You help me a lot!

But I still have one question, would you please explore! Maybe my question was not clear before: Let me explain details again, I have added Company FF to HH. If the Company is unique then I would like to assign it a new group no. but now all unique companies which group in one group.

Data Source
Company	Employee
AA	David
AA	Winnie
AA	Tom
AA	Daisy
BB	David
BB	Peter
CC	Mary
CC	Peter
CC	Tony
CC	Apple
DD	Lemon
DD	Sky
EE	Cara
EE	Peter
EE	Susan
FF	Mark
GG	Luck
HH	Leon

Expected Output File Format
ID	New_Grouping
AA	GP01
BB	GP01
CC	GP01
EE	GP01
David	GP01
Winnie	GP01
Tom	GP01
Daisy	GP01
Peter	GP01
Mary	GP01
Tony	GP01
Apple	GP01
Cara	GP01
Susan	GP01
DD	GP02
Lemon	GP02
Sky	GP02
Mark	GP03
Luck	GP04
Leon	GP05

PGStats · Posted 09-27-2018 12:28 AM

Slight modif takes care of single cases

data have;
input Company $ Employee $;
datalines;
AA David 
AA Winnie 
AA Tom 
AA Daisy 
BB David 
BB Peter 
CC Mary 
CC Peter 
CC Tony 
CC Apple 
DD Lemon 
DD Sky 
EE Cara 
EE Peter 
EE Susan 
FF Mark 
GG Luck 
HH Leon 
;

data arcs;
length from to $12;
do i = 0 by 1 until(last.company);
    set have; by company notsorted;
    to = Employee;
    if not missing(from) then output;
    from = to;
    end;
if not i then output;
keep from to;
run;

%include "&sasforum.\subGraphsmacro.sas";

%subgraphs(arcs, out=groups);

proc sql;
create table haveGroups as
select a.*, b.clust
from have as a inner join groups as b on a.employee=b.node
order by clust;
quit;

data fullGroups;
set haveGroups;
grouping = cats("GP",clust);
id = company; output;
id = employee; output;
keep id grouping;
run;

proc sort data=fullGroups out=want nodupkey; by grouping id; run;

proc print data=want noobs; run;

                                 grouping    id

                                   GP1       AA
                                   GP1       Apple
                                   GP1       BB
                                   GP1       CC
                                   GP1       Cara
                                   GP1       Daisy
                                   GP1       David
                                   GP1       EE
                                   GP1       Mary
                                   GP1       Peter
                                   GP1       Susan
                                   GP1       Tom
                                   GP1       Tony
                                   GP1       Winnie
                                   GP11      FF
                                   GP11      Mark
                                   GP12      GG
                                   GP12      Luck
                                   GP13      HH
                                   GP13      Leon
                                   GP8       DD
                                   GP8       Lemon
                                   GP8       Sky

PG

New_SAS_user76 · Posted 09-26-2018 05:27 AM

When I run the below program then I found the record which is unique will be disappear!

I want to keep all the record even unique or duplicate! How to amend it? Thanks!

data arcs;
length from to $12;
do until(last.company);
    set have; by company notsorted;
    to = Employee;
    if not missing(from) then output;
    from = to;
    end;
keep from to;
run;

Ksharp · Posted 09-20-2018 08:56 AM

Hash Table could get you faster.

data have;
input from $ to $;
datalines;
AA David 
AA Winnie 
AA Tom 
AA Daisy 
BB David 
BB Peter 
CC Mary 
CC Peter 
CC Tony 
CC Apple 
DD Lemon 
DD Sky 
EE Cara 
EE Peter 
EE Susan 
;

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

New_SAS_user76 · Posted 09-26-2018 04:25 AM

Hi Thanks for your solution!

But the output was not my expected. Maybe my question was not clear!

Ksharp · Posted 09-26-2018 08:54 AM

OK. The following is what you are looking for ?

data have;
input from $ to $;
datalines;
AA	David
AA	Winnie
AA	Tom
AA	Daisy
BB	David
BB	Peter
CC	Mary
CC	Peter
CC	Tony
CC	Apple
DD	Lemon
DD	Sky
EE	Cara
EE	Peter
EE	Susan
FF	Mark
GG	Luck
HH	Leon
;

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

proc sql;
create table unique as
 select * from have
  group by from
   having count(*)=1;

create table final_want as
 select *
  from want where node not in (select from from unique);
quit;

Ksharp · Posted 09-26-2018 08:56 AM

OK. This is what you are looking for ?

data have;
input from $ to $;
datalines;
AA	David
AA	Winnie
AA	Tom
AA	Daisy
BB	David
BB	Peter
CC	Mary
CC	Peter
CC	Tony
CC	Apple
DD	Lemon
DD	Sky
EE	Cara
EE	Peter
EE	Susan
FF	Mark
GG	Luck
HH	Leon
;

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

proc sql;
create table unique as
 select * from have
  group by from
   having count(*)=1;

create table final_want as
 select *
  from want where node not in (select from from unique);
quit;

New_SAS_user76 · Posted 09-26-2018 09:51 PM

Sorry I can't get the output! Would you please tell me which variable should I define "Company" and "Employee"?

data full;
set have end=last;
if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;

Ksharp · Posted 09-27-2018 08:21 AM

Rename your variables into FROM and TO. 
And running the code after it.

data have(rename=(Company=from  Employee=to));
input Company $ Employee $;
datalines;
AA	David
AA	Winnie

Re-mapping

Re: Re-mapping

Re: Re-mapping

Re: Re-mapping

Re: Re-mapping

Re: Re-mapping

Re: Re-mapping

Re: Re-mapping

Re: Re-mapping

Re: Re-mapping

Re: Re-mapping

Re: Re-mapping

Registration is open

SAS Training: Just a Click Away