code to create an indicator variable

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 96
Accepted Solution

code to create an indicator variable

Hi All,

I am trying to create a variable, SPECIALIST, that takes value 1 if the auditor has at least 40 percent or more clients in the two digit industry classification (SIC) code and 0 otherwise. The information I have is firm identifier (TIC), Year, industry classification (SIC), and auditor name (AU_NAME). The data looks as follows:

TIC     YEAR     SIC     AU_NAME

A         2004     12        KPMG

A         2005     12        KPMG

A         2006     12        KPMG

A         2007     12        KPMG

B         2004     12        KPMG

B         2005     12        KPMG

B         2006     12        KPMG

B         2007     12        PwC

C        2004     12        PwC

C        2005     12        PwC

C        2006     12        PwC

C        2007     12        PwC 

The output I want is

TIC     YEAR     SIC     AU_NAME     SPECIALIST

A         2004     12        KPMG          1

A         2005     12        KPMG          1

A         2006     12        KPMG          1

A         2007     12        KPMG          0

B         2004     12        KPMG          1

B         2005     12        KPMG          1

B         2006     12        KPMG          1

B         2007     12        PwC             0

C        2004     12        PwC               0

C        2005     12        PwC               0

C        2006     12        PwC               0

C        2007     12        PwC               1

To construct this variable, I first created a variable OBS which takes 1 for all the observations. I then used the following code:

proc sort data = have ; by AU_NAME SIC YEAR; run;

proc means data= have n;

var OBS;

by AU_NAME SIC YEAR;

output out = dataset1 n=nclient;

run; quit;

data dataset2; merge have dataset1; by AU_NAME SIC YEAR; run;

proc sort data = have ; by  SIC YEAR; run;

proc means data= have n;

var OBS;

by  SIC YEAR;

output out = dataset3 n=tclient;

run; quit;

data dataset4; merge dataset2 dataset3; by  SIC YEAR; run;

data final; set dataset4;

client_ratio = nclient/tclient;

SPECIALIST =0; if client_ratio ge 0.4 then SPECIALIST = 1;

run;

Can someone please tell me if I am doing it correctly? Also is there any easy code to develop the same?

Thank you for your time.


Accepted Solutions
Solution
‎05-17-2014 10:09 PM
Respected Advisor
Posts: 4,651

Re: code to create an indicator variable

I think that PwC should have SPECIALIST=1 for TIC=B in 2007. Here is how to do this in SQL :

data have;

input TIC $ YEAR SIC AU_NAME $;

datalines;

A         2004     12        KPMG

A         2005     12        KPMG

A         2006     12        KPMG

A         2007     12        KPMG

B         2004     12        KPMG

B         2005     12        KPMG

B         2006     12        KPMG

B         2007     12        PwC

C        2004     12        PwC

C        2005     12        PwC

C        2006     12        PwC

C        2007     12        PwC

;

proc sql;

create table want as

select have.*, pct >= 0.4 as SPECIALIST

from

    have natural join

    (select SIC, year, AU_NAME, n/sum(n) as pct

    from

        (select SIC, year, AU_NAME, count(*) as n

        from have

        group by SIC, year, AU_NAME)

    group by SIC, year)

order by TIC, year, SIC, AU_NAME;

select * from want;

quit;

PG

PG

View solution in original post


All Replies
Solution
‎05-17-2014 10:09 PM
Respected Advisor
Posts: 4,651

Re: code to create an indicator variable

I think that PwC should have SPECIALIST=1 for TIC=B in 2007. Here is how to do this in SQL :

data have;

input TIC $ YEAR SIC AU_NAME $;

datalines;

A         2004     12        KPMG

A         2005     12        KPMG

A         2006     12        KPMG

A         2007     12        KPMG

B         2004     12        KPMG

B         2005     12        KPMG

B         2006     12        KPMG

B         2007     12        PwC

C        2004     12        PwC

C        2005     12        PwC

C        2006     12        PwC

C        2007     12        PwC

;

proc sql;

create table want as

select have.*, pct >= 0.4 as SPECIALIST

from

    have natural join

    (select SIC, year, AU_NAME, n/sum(n) as pct

    from

        (select SIC, year, AU_NAME, count(*) as n

        from have

        group by SIC, year, AU_NAME)

    group by SIC, year)

order by TIC, year, SIC, AU_NAME;

select * from want;

quit;

PG

PG
Frequent Contributor
Posts: 96

Re: code to create an indicator variable

Thank you for sharing the code and for your prompt reply.

Shalmali

Super User
Posts: 9,682

Re: code to create an indicator variable

data have;
input TIC $ YEAR SIC AU_NAME $;
datalines;
A         2004     12        KPMG
A         2005     12        KPMG
A         2006     12        KPMG
A         2007     12        KPMG
B         2004     12        KPMG
B         2005     12        KPMG
B         2006     12        KPMG
B         2007     12        PwC
C        2004     12        PwC
C        2005     12        PwC
C        2006     12        PwC
C        2007     12        PwC
;
run;
proc sql;
create table want as
 select *,case 
  when (select count(distinct tic) from have where sic=a.sic and year=a.year and AU_NAME=a.AU_NAME)/(select count(distinct tic) from have where sic=a.sic and year=a.year) ge 0.4 then 1
     else 0 end as flag
  from have as a;
quit;

Xia Keshan

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 324 views
  • 3 likes
  • 3 in conversation