BookmarkSubscribeRSS Feed
mspak
Quartz | Level 8

Good days to all,

I have a set of data (as per attached), with the following variables:

gvkey = US company identification code

fyear = financial year

Auditor = auditor identification code

sic2 = 2-digit industry classification code

CBSA = City

NAT_Mkshr = National market share, based on SIC2 and fyear. 

City_Mkshr = City market share, based on CBSA, SIC2 and fyear.

I would like to calculate Auditor Industry Specialist, based on the following definitions:

  • National auditor specialist: An auditor is defined as a national industry specialist if it has the largest annual (ie fyear) market share in an industry, based on the two-digit SIC category, and if its annual market share is at least 10 percentage points greater than its closest competitor in a national  audit market.
  • City auditor specialist: An auditor is defined as a city industry specialist if it has the largest annual (ie fyear) market share in an industry, based on the two-digit SIC category for a given city (ie. CBSA), and if its annual market share is at least 10 percentage points greater than its closest competitor in a city audit market.

I learned from my search that there are SAS function, namely LARGEST and COMPARE. I have no idea on whether these functions can be applied in my case.

Hope to get any simple code to my problem.

Thanks in advance for any help.

Regards,

mspak

8 REPLIES 8
esjackso
Quartz | Level 8

See if this can get you started. Proc rank will rank the observations by fyear, au, sic2. The rank variable is in mk_nat.

proc sort data=test out=test2; by fyear sic2 au; run;

proc rank data=test2 out=test3;

  by fyear sic2 au;

  var nat_mkshr;

  ranks rnk_nat;

run;

As far as the rest there are some other rules you will have to work through like market share ties. But if you keep everyone with a rank < 3 those should be all the people you need to look at and run through a data step to flag the specialists.

Hope this helps!

EJ

esjackso
Quartz | Level 8

By the way the functions you mentioned I dont think will help since they would work across variables and not across observations.

EJ

mspak
Quartz | Level 8

Hi EJ,

Thanks for your helps.

I try to run the following code:

proc sort data=final.test  out=test1 nodupkey; by fyear sic2 au; run;

proc sort data=test1 out=test2; by sic2 fyear ; run;

proc rank data=test2  descending out=test3;

  by sic2 fyear ;

  var nat_mkshr;

  ranks rnk_nat;

run;

proc sort data=test3 out=test4;

by sic2 fyear descending rnk_nat; run;

data test5; set test4;

by sic2 fyear descending rnk_nat;

natshr_dif = dif(nat_mkshr); run;

proc sort data=test5 out=test6; by sic2 fyear cbsa; run;

proc rank data=test6  descending out=test7;

  by sic2 fyear cbsa;

  var City_Mkshr;

  ranks rnk_city;

run;

proc sort data=test7 out=test8;

by sic2 fyear cbsa descending rnk_city; run;

data test9; set test8;

by sic2 fyear cbsa descending rnk_nat;

cityshr_dif = dif(City_Mkshr); run;

I found everything is within my expectation up to the output of test8.

I found the following error whenever I try to output "test9". I found that the obs in test9 has drop significantly compared to as in test8.

ERROR: BY variables are not properly sorted on data set WORK.TEST8.

Thank you.

Regards,

mspak

Do you know the reason?

Jagadishkatam
Amethyst | Level 16

Hi mspak,

in proc sort for test8 you have used the by variables as sic2 fyear cbsa descending rnk_city;

However while creating the dataset test9, the by variable order is sic2 fyear cbsa descending rnk_nat;

Here, please replace rnk_nat with rnk_city and you will not get any error.

Thanks,

Jagadish

Thanks,
Jag
esjackso
Quartz | Level 8

Good catch. I noticed the same thing.

Also I always forget about the dif function so it was good to see that was what you are using.

EJ

mspak
Quartz | Level 8

Hi Jagadish and EJ,

There was a typo error on the variable (rnk_city instead of rnk_nat). I checked the code for a few times, but I could not realise the mistake Smiley Sad.

Thank you helping me Smiley Happy.

Regards,

mspak

esjackso
Quartz | Level 8

No problem ... glad it helped!

EJ

mspak
Quartz | Level 8

Hi all,

I modified the codes as follows to generate my desired output:

proc sort data=final.test  out=test1 nodupkey; by fyear sic2 au; run;

proc sort data=test1 out=test2; by sic2 fyear ; run;

proc rank data=test2  descending out=test3;

  by sic2 fyear ;

  var nat_mkshr;

  ranks rnk_nat;

run;

proc sort data=test3 out=test4;

by sic2 fyear descending rnk_nat; run;

data test5; set test4;

by sic2 fyear descending rnk_nat;

natshr_dif = dif(nat_mkshr);

IF FIRST.FYEAR THEN DO;

natshr_dif = .;

END;

run;

proc sort data=test5 out=test6; by sic2 fyear cbsa; run;

proc rank data=test6  descending out=test7;

  by sic2 fyear cbsa;

  var City_Mkshr;

  ranks rnk_city;

run;

proc sort data=test7 out=test8;

by sic2 fyear cbsa descending rnk_city; run;

data test9; set test8 (drop=gvkey);

by sic2 fyear cbsa descending rnk_city;

cityshr_dif = dif(City_Mkshr);

IF FIRST.CBSA THEN DO;

cityshr_dif = .;

END;

run;

data test10; set test9;

if rnk_nat = 1 and natshr_dif>=0.1 then nat_AIP1 = 1; else nat_AIP1 = 0;

if rnk_city ^= 1 OR . <cityshr_dif <0.1  then city_AIP1 = 0; else city_AIP1 = 1;

run;

Thank you.

Regads,

mspak

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3411 views
  • 7 likes
  • 3 in conversation