Help using Base SAS procedures

How to count for links within the data

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 75
Accepted Solution

How to count for links within the data

Hi All,

I've been struggling to write a neat code for some task but gets very cumbersome and complex each time.

I have a data set like below.

NAMECompanyINDUSTRY
JohnAFinance
JohnBfinance
JohnCinformation
JohnDInsurance
Mark
Mark

Whenever a person is present in two companies, there is a link between these two companies. What I am trying to compute is the number of links between a company with a company from same or another industry. For example, in the above, as long as John is present in Company A and B of finance, C of information and D of Insurance , I want to count 2 outside links and 1 inside link for A. So, my result should be something like this:

CompanyLinks outside industryLinks inside industry
A21
B21
C10
D10

I feel like I am missing some procs that might get my work to be done easier. Would some one please help me with the basic idea and Procs? Thanks a lot!


Accepted Solutions
Solution
‎06-23-2013 11:16 AM
Super User
Super User
Posts: 7,042

Re: How to count for links within the data

Posted in reply to Shayan2012

You need to be clearer on how to count.  Try this.  It will combine the data with itself and generate 0/1 flags based on whether the industry names match. (note you might need to clean your data, I made the records for Finance match). 

data have;

input NAME $ Company $ INDUSTRY $20.;

cards;

John A Finance

John B Finance

John C Information

John D Insurance

run;

proc sql noprint ;

  create table xx as

  select a.name

       , a.industry=b.industry as inside

       , a.industry^=b.industry as outside

         , a.company as companya

         , b.company as companyb

         , a.industry as industrya

         , b.industry as industryb

  from have a

     , have b

  where a.name = b.name

    and not (a.industry = b.industry and a.company=b.company)

  ;

quit;

proc print;

run;

proc sql noprint ;

  create table want as

  select companya , sum(inside) as num_inside , sum(outside) as num_outside

  from xx

  group by 1

  order by 1

  ;

quit;

proc print;

run;

                    num_       num_

Obs    companya    inside    outside

1        A           1         2

2        B           1         2

3        C           0         3

4        D           0         3

View solution in original post


All Replies
Solution
‎06-23-2013 11:16 AM
Super User
Super User
Posts: 7,042

Re: How to count for links within the data

Posted in reply to Shayan2012

You need to be clearer on how to count.  Try this.  It will combine the data with itself and generate 0/1 flags based on whether the industry names match. (note you might need to clean your data, I made the records for Finance match). 

data have;

input NAME $ Company $ INDUSTRY $20.;

cards;

John A Finance

John B Finance

John C Information

John D Insurance

run;

proc sql noprint ;

  create table xx as

  select a.name

       , a.industry=b.industry as inside

       , a.industry^=b.industry as outside

         , a.company as companya

         , b.company as companyb

         , a.industry as industrya

         , b.industry as industryb

  from have a

     , have b

  where a.name = b.name

    and not (a.industry = b.industry and a.company=b.company)

  ;

quit;

proc print;

run;

proc sql noprint ;

  create table want as

  select companya , sum(inside) as num_inside , sum(outside) as num_outside

  from xx

  group by 1

  order by 1

  ;

quit;

proc print;

run;

                    num_       num_

Obs    companya    inside    outside

1        A           1         2

2        B           1         2

3        C           0         3

4        D           0         3

Respected Advisor
Posts: 4,920

Re: How to count for links within the data

Posted in reply to Shayan2012

I agree with Tom, SQL is your friend here, and it prefers clean data. I asumed that links provided by more than one name count only once and that companies associated with more than one industry count only once :

data have;
length Name Company Industry $24;
informat Name Company Industry $upcase.;
input Name Company Industry;
datalines;
John A Finance
John B finance
John C information
John D Insurance
Henry   A   FINANCE
Henry   B   finance
Henry   C   Information
Henry   D   Banking
;

proc sql;
create table links as
select
     Company,
     sum(sameIndustry) as insideLinks,
     sum(not sameIndustry) as outsideLinks
from (
     select distinct
          a.Company as Company,
          b.Company as otherCompany,
          a.Industry=b.Industry as sameIndustry
     from
          have as a left join
          have as b on a.Name=b.Name and a.Company ne b.Company
     )
group by Company;


select * from links;


quit;

                Company                   insideLinks  outsideLinks
                ___________________________________________________
                A                                   1             2
                B                                   1             2
                C                                   0             3
                D                                   0             3

PG

PG
Respected Advisor
Posts: 3,156

Re: How to count for links within the data

Posted in reply to Shayan2012

I concur with Tom and PG. You need to clarify your rules further to rationalize your output. Tom and PG’s interpretation is clear and consistent, therefore programmatic doable. I always believe that for any one of Proc SQL approaches, you can find its counterpart in data step Hash(), and this one seems no exception:

data have;

input NAME $ Company $ INDUSTRY $20.;

cards;

John A Finance

John B Finance

John C Information

John D Insurance

run;

data want;

  if _n_=1 then do;

    if 0 then set have(rename=(company=_com industry=_ind));

       declare hash h(dataset:'have(rename=(company=_com industry=_ind))', multidata:'y');

       h.definekey('name');

       h.definedata(all:'y');

       h.definedone();

  end;

  set have;

  inside=0;outside=0;

    do rc=h.find() by 0 while (rc=0);

        if industry=_ind and company ne _com then inside+1;

        if industry ne _ind and company ne _com then outside+1;

rc=h.find_next();

       end;

drop rc _:;

run;

Haikuo

Super User
Posts: 10,023

Re: How to count for links within the data

Posted in reply to Shayan2012

Your post is not clear , why A=2 while B=0 not 2 too ?

  data have;
length Name Company Industry $24;
informat Name Company Industry $upcase.;
input Name Company Industry;
datalines;
John A finance
John B finance
John C information
John D Insurance
Henry   A   finance
Henry   B   finance
Henry   C   Information
Henry   D   Banking
;

 

proc sql;
create table links as
select *,(select count(*) from have where have.name=a.name and have.industry ne a.industry ) as outside,
         (select count(*)-1 from have where have.name=a.name and have.industry eq a.industry ) as inside
 from have as a;
quit;
              

Ksharp

Message was edited by: xia keshan

Frequent Contributor
Posts: 75

Re: How to count for links within the data

You are absolutely correct. ksharp! . I corrected the table for future references.

Frequent Contributor
Posts: 75

Re: How to count for links within the data

Posted in reply to Shayan2012

Wow. Thanks a lot guys. Your codes were super helpful to me. Actually, I was writing the code in matlab with lots of for loops and it was horrible. Guess need to learn a lot of sql!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 186 views
  • 6 likes
  • 5 in conversation