Hello All,
I have a dataset of paired companies. The pairs appear in a pattern of the same groups, but I do not have a unique identifier for each group.
How can I create a unique identifier for each group of pairs in SAS? Companies do repeat themselves in the sample - for instance they could have multiple pairs and then appear again as the pair for another company. The groups are consistent throughout the sample year. Hopefully this will make more sense after reviewing the sample data below. Please let me know if more information is needed!
Here's what I have and the Group identifier that I'd like to create:
Company | Paired Company | Group |
A | C | 1 |
A | E | 1 |
A | F | 1 |
A | G | 1 |
B | H | 2 |
B | I | 2 |
B | J | 2 |
C | A | 1 |
C | E | 1 |
C | F | 1 |
C | G | 1 |
D | K | 3 |
D | L | 3 |
E | A | 1 |
E | C | 1 |
E | F | 1 |
E | G | 1 |
Please try the below code and check the want2 dataset where you will see the expected output,
data have;
input Company$ Paired_Company$;
cards;
A C
A E
A F
A G
B H
B I
B J
C A
C E
C F
C G
D K
D L
E A
E C
E F
E G
;
proc sort data=have;
by Paired_Company Company;
run;
proc sql;
create table want as select case when count(Paired_Company)>1 then 0 else count(Paired_Company) end as count, case when count(Paired_Company)>1 then 'Sample' else Company end as comp,
* from have group by Paired_Company;
quit;
proc sort data=want;
by count comp;
run;
data want2;
set want;
by count comp;
retain group;
if first.comp then group+1;
drop count comp;
run;
proc sort data=want2;
by Company Paired_Company ;
run;
Please try the below code and check the want2 dataset where you will see the expected output,
data have;
input Company$ Paired_Company$;
cards;
A C
A E
A F
A G
B H
B I
B J
C A
C E
C F
C G
D K
D L
E A
E C
E F
E G
;
proc sort data=have;
by Paired_Company Company;
run;
proc sql;
create table want as select case when count(Paired_Company)>1 then 0 else count(Paired_Company) end as count, case when count(Paired_Company)>1 then 'Sample' else Company end as comp,
* from have group by Paired_Company;
quit;
proc sort data=want;
by count comp;
run;
data want2;
set want;
by count comp;
retain group;
if first.comp then group+1;
drop count comp;
run;
proc sort data=want2;
by Company Paired_Company ;
run;
Hi Jag,
Thank you for replying so quickly...
When I run this code, I get a unique identifier for one of the pairs in each group, but the others are all assigned a value of 1. Any idea what I could be doing wrong?
In bold below, I have highlighted the issue. The B grouping should all have a Group value of 2 and the D grouping a value of 3. Essentially I'm creating an industry identifier for each company based on its pairs. All companies in a particular industry should have the same Group value.
Company | Paired_Company | Group | Expected Value |
A | C | 1 | |
A | G | 1 | |
A | F | 1 | |
A | E | 1 | |
B | J | 2 | |
B | I | 1 | 2 |
B | H | 1 | 2 |
C | G | 1 | |
C | F | 1 | |
C | E | 1 | |
C | A | 1 | |
D | L | 3 | |
D | K | 1 | 3 |
E | C | 1 | |
E | A | 1 | |
E | G | 1 | |
E | F | 1 |
proc optnet will find the connected companies:
data test;
input Company$ PairedCompany$;
datalines;
A C
A E
A F
A G
B H
B I
B J
C A
C E
C F
C G
D K
D L
E A
E C
E F
E G
;
proc optnet data_links=test out_nodes=groups GRAPH_DIRECTION=UNDIRECTED;
data_links_var from=company to=pairedcompany;
concomp;
run;
proc sql;
create table want as
select a.*, b.concomp as group
from
test as a left join
groups as b on a.company=b.node;
quit;
Data want:
Company PairedCompany group A C 1 A G 1 A F 1 A E 1 B J 2 B I 2 B H 2 C G 1 C F 1 C E 1 C A 1 D L 3 D K 3 E C 1 E A 1 E G 1 E F 1
If you do not have SAS/OR .
data test;
input Company$ PairedCompany$;
datalines;
A C
A E
A F
A G
B H
B I
B J
C A
C E
C F
C G
D K
D L
E A
E C
E F
E G
;
data have;
set test;
rename Company=from PairedCompany=to;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
data final_want;
if _n_=1 then do;
if 0 then set want;
declare hash h(dataset:'want');
h.definekey('node');
h.definedata('household');
h.definedone();
end;
set test;
h.find(key:company);
drop node;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.