Quartz | Level 8

## How to count for links within the data

Hi All,

I've been struggling to write a neat code for some task but gets very cumbersome and complex each time.

I have a data set like below.

 NAME Company INDUSTRY John A Finance John B finance John C information John D Insurance Mark … … Mark … …

Whenever a person is present in two companies, there is a link between these two companies. What I am trying to compute is the number of links between a company with a company from same or another industry. For example, in the above, as long as John is present in Company A and B of finance, C of information and D of Insurance , I want to count 2 outside links and 1 inside link for A. So, my result should be something like this:

 Company Links outside industry Links inside industry A 2 1 B 2 1 C 1 0 D 1 0

I feel like I am missing some procs that might get my work to be done easier. Would some one please help me with the basic idea and Procs? Thanks a lot!

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: How to count for links within the data

You need to be clearer on how to count.  Try this.  It will combine the data with itself and generate 0/1 flags based on whether the industry names match. (note you might need to clean your data, I made the records for Finance match).

data have;

input NAME \$ Company \$ INDUSTRY \$20.;

cards;

John A Finance

John B Finance

John C Information

John D Insurance

run;

proc sql noprint ;

create table xx as

select a.name

, a.industry=b.industry as inside

, a.industry^=b.industry as outside

, a.company as companya

, b.company as companyb

, a.industry as industrya

, b.industry as industryb

from have a

, have b

where a.name = b.name

and not (a.industry = b.industry and a.company=b.company)

;

quit;

proc print;

run;

proc sql noprint ;

create table want as

select companya , sum(inside) as num_inside , sum(outside) as num_outside

from xx

group by 1

order by 1

;

quit;

proc print;

run;

num_       num_

Obs    companya    inside    outside

1        A           1         2

2        B           1         2

3        C           0         3

4        D           0         3

6 REPLIES 6
Super User

## Re: How to count for links within the data

You need to be clearer on how to count.  Try this.  It will combine the data with itself and generate 0/1 flags based on whether the industry names match. (note you might need to clean your data, I made the records for Finance match).

data have;

input NAME \$ Company \$ INDUSTRY \$20.;

cards;

John A Finance

John B Finance

John C Information

John D Insurance

run;

proc sql noprint ;

create table xx as

select a.name

, a.industry=b.industry as inside

, a.industry^=b.industry as outside

, a.company as companya

, b.company as companyb

, a.industry as industrya

, b.industry as industryb

from have a

, have b

where a.name = b.name

and not (a.industry = b.industry and a.company=b.company)

;

quit;

proc print;

run;

proc sql noprint ;

create table want as

select companya , sum(inside) as num_inside , sum(outside) as num_outside

from xx

group by 1

order by 1

;

quit;

proc print;

run;

num_       num_

Obs    companya    inside    outside

1        A           1         2

2        B           1         2

3        C           0         3

4        D           0         3

Opal | Level 21

## Re: How to count for links within the data

I agree with Tom, SQL is your friend here, and it prefers clean data. I asumed that links provided by more than one name count only once and that companies associated with more than one industry count only once :

data have;
length Name Company Industry \$24;
informat Name Company Industry \$upcase.;
input Name Company Industry;
datalines;
John A Finance
John B finance
John C information
John D Insurance
Henry   A   FINANCE
Henry   B   finance
Henry   C   Information
Henry   D   Banking
;

proc sql;
select
Company,
from (
select distinct
a.Company as Company,
b.Company as otherCompany,
a.Industry=b.Industry as sameIndustry
from
have as a left join
have as b on a.Name=b.Name and a.Company ne b.Company
)
group by Company;

quit;

___________________________________________________
A                                   1             2
B                                   1             2
C                                   0             3
D                                   0             3

PG

PG
Onyx | Level 15

## Re: How to count for links within the data

I concur with Tom and PG. You need to clarify your rules further to rationalize your output. Tom and PG’s interpretation is clear and consistent, therefore programmatic doable. I always believe that for any one of Proc SQL approaches, you can find its counterpart in data step Hash(), and this one seems no exception:

data have;

input NAME \$ Company \$ INDUSTRY \$20.;

cards;

John A Finance

John B Finance

John C Information

John D Insurance

run;

data want;

if _n_=1 then do;

if 0 then set have(rename=(company=_com industry=_ind));

declare hash h(dataset:'have(rename=(company=_com industry=_ind))', multidata:'y');

h.definekey('name');

h.definedata(all:'y');

h.definedone();

end;

set have;

inside=0;outside=0;

do rc=h.find() by 0 while (rc=0);

if industry=_ind and company ne _com then inside+1;

if industry ne _ind and company ne _com then outside+1;

rc=h.find_next();

end;

drop rc _:;

run;

Haikuo

Super User

## Re: How to count for links within the data

Your post is not clear , why A=2 while B=0 not 2 too ?

```  data have;
length Name Company Industry \$24;
informat Name Company Industry \$upcase.;
input Name Company Industry;
datalines;
John A finance
John B finance
John C information
John D Insurance
Henry   A   finance
Henry   B   finance
Henry   C   Information
Henry   D   Banking
;

proc sql;
select *,(select count(*) from have where have.name=a.name and have.industry ne a.industry ) as outside,
(select count(*)-1 from have where have.name=a.name and have.industry eq a.industry ) as inside
from have as a;
quit;

```

Ksharp

Message was edited by: xia keshan

Quartz | Level 8

## Re: How to count for links within the data

You are absolutely correct. ksharp! . I corrected the table for future references.

Quartz | Level 8

## Re: How to count for links within the data

Wow. Thanks a lot guys. Your codes were super helpful to me. Actually, I was writing the code in matlab with lots of for loops and it was horrible. Guess need to learn a lot of sql!

Discussion stats
• 6 replies
• 827 views
• 6 likes
• 5 in conversation