Solved: Re: Creating a unique group identifier for paired observations

JayVD305 · Posted 05-05-2020 11:32 AM

Hello All,

I have a dataset of paired companies. The pairs appear in a pattern of the same groups, but I do not have a unique identifier for each group.

How can I create a unique identifier for each group of pairs in SAS? Companies do repeat themselves in the sample - for instance they could have multiple pairs and then appear again as the pair for another company. The groups are consistent throughout the sample year. Hopefully this will make more sense after reviewing the sample data below. Please let me know if more information is needed!

Here's what I have and the Group identifier that I'd like to create:

Company	Paired Company	Group
A	C	1
A	E	1
A	F	1
A	G	1
B	H	2
B	I	2
B	J	2
C	A	1
C	E	1
C	F	1
C	G	1
D	K	3
D	L	3
E	A	1
E	C	1
E	F	1
E	G	1

Jagadishkatam · Posted 05-05-2020 12:12 PM

Please try the below code and check the want2 dataset where you will see the expected output,

data have;
input Company$ Paired_Company$;
cards;	
A C  
A E  
A F  
A G  
B H  
B I  
B J  
C A  
C E  
C F  
C G  
D K  
D L  
E A  
E C  
E F  
E G
;


proc sort data=have;
by Paired_Company Company;
run;

proc sql;
create table want as select case when count(Paired_Company)>1 then 0 else count(Paired_Company) end as count, case when count(Paired_Company)>1 then 'Sample' else Company end as comp, 
* from have group by Paired_Company;
quit;

proc sort data=want;
by count comp;
run;

data want2;
set want;
by count comp;
retain group;
if first.comp then group+1;
drop count comp;
run;

proc sort data=want2;
by Company Paired_Company ;
run;

Thanks,
Jag

View solution in original post

Jagadishkatam · Posted 05-05-2020 12:12 PM

Please try the below code and check the want2 dataset where you will see the expected output,

data have;
input Company$ Paired_Company$;
cards;	
A C  
A E  
A F  
A G  
B H  
B I  
B J  
C A  
C E  
C F  
C G  
D K  
D L  
E A  
E C  
E F  
E G
;


proc sort data=have;
by Paired_Company Company;
run;

proc sql;
create table want as select case when count(Paired_Company)>1 then 0 else count(Paired_Company) end as count, case when count(Paired_Company)>1 then 'Sample' else Company end as comp, 
* from have group by Paired_Company;
quit;

proc sort data=want;
by count comp;
run;

data want2;
set want;
by count comp;
retain group;
if first.comp then group+1;
drop count comp;
run;

proc sort data=want2;
by Company Paired_Company ;
run;

Thanks,
Jag

JayVD305 · Posted 05-05-2020 09:05 PM

Hi Jag,

Thank you for replying so quickly...

When I run this code, I get a unique identifier for one of the pairs in each group, but the others are all assigned a value of 1. Any idea what I could be doing wrong?

In bold below, I have highlighted the issue. The B grouping should all have a Group value of 2 and the D grouping a value of 3. Essentially I'm creating an industry identifier for each company based on its pairs. All companies in a particular industry should have the same Group value.

Company	Paired_Company	Group	Expected Value
A	C	1
A	G	1
A	F	1
A	E	1
B	J	2
B	I	1	2
B	H	1	2
C	G	1
C	F	1
C	E	1
C	A	1
D	L	3
D	K	1	3
E	C	1
E	A	1
E	G	1
E	F	1

PGStats · Posted 05-05-2020 01:02 PM

proc optnet will find the connected companies:

data test;
input Company$ PairedCompany$;
datalines;
A   C 
A   E 
A   F 
A   G 
B   H 
B   I 
B   J 
C   A 
C   E 
C   F 
C   G 
D   K 
D   L 
E   A 
E   C 
E   F 
E   G 
;

proc optnet data_links=test out_nodes=groups GRAPH_DIRECTION=UNDIRECTED;
data_links_var from=company to=pairedcompany;
concomp;
run;

proc sql;
create table want as
select a.*, b.concomp as group
from 
	test as a left join
	groups as b on a.company=b.node;
quit;

Data want:

Company 	PairedCompany 	group
A 	C 	1
A 	G 	1
A 	F 	1
A 	E 	1
B 	J 	2
B 	I 	2
B 	H 	2
C 	G 	1
C 	F 	1
C 	E 	1
C 	A 	1
D 	L 	3
D 	K 	3
E 	C 	1
E 	A 	1
E 	G 	1
E 	F 	1

PG

Ksharp · Posted 05-06-2020 08:00 AM

If you do not have SAS/OR .

data test;
input Company$ PairedCompany$;
datalines;
A   C 
A   E 
A   F 
A   G 
B   H 
B   I 
B   J 
C   A 
C   E 
C   F 
C   G 
D   K 
D   L 
E   A 
E   C 
E   F 
E   G 
;


data have;
set test;
rename  Company=from PairedCompany=to;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

data final_want;
 if _n_=1 then do;
  if 0 then set want;
  declare hash h(dataset:'want');
  h.definekey('node');
  h.definedata('household');
  h.definedone();
 end;
set test;
h.find(key:company);
drop node;
run;

Creating a unique group identifier for paired observations

Re: Creating a unique group identifier for paired observations

Re: Creating a unique group identifier for paired observations

Re: Creating a unique group identifier for paired observations

Re: Creating a unique group identifier for paired observations

Re: Creating a unique group identifier for paired observations

Registration is open

SAS Training: Just a Click Away