Solved: Re: How to omitt some redundant observations

Shayan2012 · Posted 07-20-2014 05:01 PM

Hi all,

For some network analysis, I want to check whether two different firms are connected by a mutual person.

For example in the following table, A and C are connected through person X.

Firm Person

A x

B y

C x

Up to know, I have written this code:

proc sql;

create table want as

select a.firm as firm_a, b.firm as firm_b

from have as a, have as b,

where

a.person=b.person and a.firm ~=b.firm;

quit;

My problem is that in this case, I will get two observations, although they are only one because SAS will match every observation twice so I will get:

Firm_a Firm_b

A C

C A

Obviously they are not duplicate, but redundant anyways, and unfortunately I could not get rid of them easily. How can I remove these multiple observations? Should i alter my code or change it after the new table?

I will appreciate it if you can help me,

Thanks,

PGStats · Posted 07-20-2014 09:18 PM

Just add a step:

data have;

input Firm $ Person $;

datalines;

A x

E y

C x

D z

B y

F x

;

proc sql;

create table long as

select person, firm, count(distinct firm) as nbFirms

from have

group by person

having count(distinct firm) > 1

order by person, firm;

quit;

data want;

length firms $200;

do until(last.person);

set long; by person;

firms = catx(",", firms, firm);

end;

keep person firms;

run;

proc print data=want noobs; run;

PG

View solution in original post

PGStats · Posted 07-20-2014 06:08 PM

Try instead to count the number of distinct firms that share the same person

proc sql;

create table want as

select person, firm, count(distinct firm) as nbFirms

from have

group by person

having count(distinct firm) > 1;

quit;

PG

Shayan2012 · Posted 07-20-2014 06:31 PM

Thank you very much, PG.

But if I'm not wrong, in this way I won't get the pairs of firms and I'll get the firms individually. I guess for my further work it would be more convenient if I get the paids, like A-C, A-B, etc. Is it possible to change into that?

Thanks again,

PGStats · Posted 07-20-2014 09:18 PM

Just add a step:

data have;

input Firm $ Person $;

datalines;

A x

E y

C x

D z

B y

F x

;

proc sql;

create table long as

select person, firm, count(distinct firm) as nbFirms

from have

group by person

having count(distinct firm) > 1

order by person, firm;

quit;

data want;

length firms $200;

do until(last.person);

set long; by person;

firms = catx(",", firms, firm);

end;

keep person firms;

run;

proc print data=want noobs; run;

PG

Shayan2012 · Posted 07-21-2014 10:51 AM

Thanks a lot, PG. That's perfect!

Haikuo · Posted 07-20-2014 08:14 PM

For a one-step solution, this code uses some undocumented feature (monotonic()) that some may frown upon:

data haVE;

input Firm$ Person$;

cards;

A x

B y

C x

;

proc sql;

create table want (drop=pair) as

select a.firm as firm_a, b.firm as firm_b, ifc(a.firm<b.firm, cats(a.firm,b.firm), cats(b.firm,a.firm) )as pair

from have as a, have as b

where

a.person=b.person and a.firm ~=b.firm

group by pair

having monotonic()=min(monotonic());

quit;

You can always break it into 2-steps.

Haikuo

Ksharp · Posted 07-21-2014 09:36 AM

if I understand what you mean , why not :

call sortc(Firm_a ,Firm_b);

then

proc sort nodupkey; by Firm_a Firm_b;run;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away