Hi all,
For some network analysis, I want to check whether two different firms are connected by a mutual person.
For example in the following table, A and C are connected through person X.
Firm Person
A x
B y
C x
Up to know, I have written this code:
proc sql;
create table want as
select a.firm as firm_a, b.firm as firm_b
from have as a, have as b,
where
a.person=b.person and a.firm ~=b.firm;
quit;
My problem is that in this case, I will get two observations, although they are only one because SAS will match every observation twice so I will get:
Firm_a Firm_b
A C
C A
Obviously they are not duplicate, but redundant anyways, and unfortunately I could not get rid of them easily. How can I remove these multiple observations? Should i alter my code or change it after the new table?
I will appreciate it if you can help me,
Thanks,
Just add a step:
data have;
input Firm $ Person $;
datalines;
A x
E y
C x
D z
B y
F x
;
proc sql;
create table long as
select person, firm, count(distinct firm) as nbFirms
from have
group by person
having count(distinct firm) > 1
order by person, firm;
quit;
data want;
length firms $200;
do until(last.person);
set long; by person;
firms = catx(",", firms, firm);
end;
keep person firms;
run;
proc print data=want noobs; run;
PG
Try instead to count the number of distinct firms that share the same person
proc sql;
create table want as
select person, firm, count(distinct firm) as nbFirms
from have
group by person
having count(distinct firm) > 1;
quit;
PG
Thank you very much, PG.
But if I'm not wrong, in this way I won't get the pairs of firms and I'll get the firms individually. I guess for my further work it would be more convenient if I get the paids, like A-C, A-B, etc. Is it possible to change into that?
Thanks again,
Just add a step:
data have;
input Firm $ Person $;
datalines;
A x
E y
C x
D z
B y
F x
;
proc sql;
create table long as
select person, firm, count(distinct firm) as nbFirms
from have
group by person
having count(distinct firm) > 1
order by person, firm;
quit;
data want;
length firms $200;
do until(last.person);
set long; by person;
firms = catx(",", firms, firm);
end;
keep person firms;
run;
proc print data=want noobs; run;
PG
Thanks a lot, PG. That's perfect!
For a one-step solution, this code uses some undocumented feature (monotonic()) that some may frown upon:
data haVE;
input Firm$ Person$;
cards;
A x
B y
C x
;
proc sql;
create table want (drop=pair) as
select a.firm as firm_a, b.firm as firm_b, ifc(a.firm<b.firm, cats(a.firm,b.firm), cats(b.firm,a.firm) )as pair
from have as a, have as b
where
group by pair
having monotonic()=min(monotonic());
quit;
You can always break it into 2-steps.
Haikuo
if I understand what you mean , why not :
call sortc(Firm_a ,Firm_b);
then
proc sort nodupkey; by Firm_a Firm_b;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.