Hi all,
For some network analysis, I want to check whether two different firms are connected by a mutual person.
For example in the following table, A and C are connected through person X.
Firm Person
A x
B y
C x
Up to know, I have written this code:
proc sql;
create table want as
select a.firm as firm_a, b.firm as firm_b
from have as a, have as b,
where
a.person=b.person and a.firm ~=b.firm;
quit;
My problem is that in this case, I will get two observations, although they are only one because SAS will match every observation twice so I will get:
Firm_a Firm_b
A C
C A
Obviously they are not duplicate, but redundant anyways, and unfortunately I could not get rid of them easily. How can I remove these multiple observations? Should i alter my code or change it after the new table?
I will appreciate it if you can help me,
Thanks,
Just add a step:
data have;
input Firm $ Person $;
datalines;
A x
E y
C x
D z
B y
F x
;
proc sql;
create table long as
select person, firm, count(distinct firm) as nbFirms
from have
group by person
having count(distinct firm) > 1
order by person, firm;
quit;
data want;
length firms $200;
do until(last.person);
set long; by person;
firms = catx(",", firms, firm);
end;
keep person firms;
run;
proc print data=want noobs; run;
PG
Try instead to count the number of distinct firms that share the same person
proc sql;
create table want as
select person, firm, count(distinct firm) as nbFirms
from have
group by person
having count(distinct firm) > 1;
quit;
PG
Thank you very much, PG.
But if I'm not wrong, in this way I won't get the pairs of firms and I'll get the firms individually. I guess for my further work it would be more convenient if I get the paids, like A-C, A-B, etc. Is it possible to change into that?
Thanks again,
Just add a step:
data have;
input Firm $ Person $;
datalines;
A x
E y
C x
D z
B y
F x
;
proc sql;
create table long as
select person, firm, count(distinct firm) as nbFirms
from have
group by person
having count(distinct firm) > 1
order by person, firm;
quit;
data want;
length firms $200;
do until(last.person);
set long; by person;
firms = catx(",", firms, firm);
end;
keep person firms;
run;
proc print data=want noobs; run;
PG
Thanks a lot, PG. That's perfect!
For a one-step solution, this code uses some undocumented feature (monotonic()) that some may frown upon:
data haVE;
input Firm$ Person$;
cards;
A x
B y
C x
;
proc sql;
create table want (drop=pair) as
select a.firm as firm_a, b.firm as firm_b, ifc(a.firm<b.firm, cats(a.firm,b.firm), cats(b.firm,a.firm) )as pair
from have as a, have as b
where
group by pair
having monotonic()=min(monotonic());
quit;
You can always break it into 2-steps.
Haikuo
if I understand what you mean , why not :
call sortc(Firm_a ,Firm_b);
then
proc sort nodupkey; by Firm_a Firm_b;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.