Quartz | Level 8

## Comparison between largest and second largest value

Good days to all,

I have a set of data (as per attached), with the following variables:

gvkey = US company identification code

fyear = financial year

Auditor = auditor identification code

sic2 = 2-digit industry classification code

CBSA = City

NAT_Mkshr = National market share, based on SIC2 and fyear.

City_Mkshr = City market share, based on CBSA, SIC2 and fyear.

I would like to calculate Auditor Industry Specialist, based on the following definitions:

• National auditor specialist: An auditor is defined as a national industry specialist if it has the largest annual (ie fyear) market share in an industry, based on the two-digit SIC category, and if its annual market share is at least 10 percentage points greater than its closest competitor in a national  audit market.
• City auditor specialist: An auditor is defined as a city industry specialist if it has the largest annual (ie fyear) market share in an industry, based on the two-digit SIC category for a given city (ie. CBSA), and if its annual market share is at least 10 percentage points greater than its closest competitor in a city audit market.

I learned from my search that there are SAS function, namely LARGEST and COMPARE. I have no idea on whether these functions can be applied in my case.

Hope to get any simple code to my problem.

Thanks in advance for any help.

Regards,

mspak

8 REPLIES 8
Quartz | Level 8

## Re: Comparison between largest and second largest value

See if this can get you started. Proc rank will rank the observations by fyear, au, sic2. The rank variable is in mk_nat.

proc sort data=test out=test2; by fyear sic2 au; run;

proc rank data=test2 out=test3;

by fyear sic2 au;

var nat_mkshr;

ranks rnk_nat;

run;

As far as the rest there are some other rules you will have to work through like market share ties. But if you keep everyone with a rank < 3 those should be all the people you need to look at and run through a data step to flag the specialists.

Hope this helps!

EJ

Quartz | Level 8

## Re: Comparison between largest and second largest value

By the way the functions you mentioned I dont think will help since they would work across variables and not across observations.

EJ

Quartz | Level 8

## Re: Comparison between largest and second largest value

Hi EJ,

I try to run the following code:

proc sort data=final.test  out=test1 nodupkey; by fyear sic2 au; run;

proc sort data=test1 out=test2; by sic2 fyear ; run;

proc rank data=test2  descending out=test3;

by sic2 fyear ;

var nat_mkshr;

ranks rnk_nat;

run;

proc sort data=test3 out=test4;

by sic2 fyear descending rnk_nat; run;

data test5; set test4;

by sic2 fyear descending rnk_nat;

natshr_dif = dif(nat_mkshr); run;

proc sort data=test5 out=test6; by sic2 fyear cbsa; run;

proc rank data=test6  descending out=test7;

by sic2 fyear cbsa;

var City_Mkshr;

ranks rnk_city;

run;

proc sort data=test7 out=test8;

by sic2 fyear cbsa descending rnk_city; run;

data test9; set test8;

by sic2 fyear cbsa descending rnk_nat;

cityshr_dif = dif(City_Mkshr); run;

I found everything is within my expectation up to the output of test8.

I found the following error whenever I try to output "test9". I found that the obs in test9 has drop significantly compared to as in test8.

ERROR: BY variables are not properly sorted on data set WORK.TEST8.

Thank you.

Regards,

mspak

Do you know the reason?

Amethyst | Level 16

## Re: Comparison between largest and second largest value

Hi mspak,

in proc sort for test8 you have used the by variables as sic2 fyear cbsa descending rnk_city;

However while creating the dataset test9, the by variable order is sic2 fyear cbsa descending rnk_nat;

Here, please replace rnk_nat with rnk_city and you will not get any error.

Thanks,

Thanks,
Jag
Quartz | Level 8

## Re: Comparison between largest and second largest value

Good catch. I noticed the same thing.

Also I always forget about the dif function so it was good to see that was what you are using.

EJ

Quartz | Level 8

## Re: Comparison between largest and second largest value

There was a typo error on the variable (rnk_city instead of rnk_nat). I checked the code for a few times, but I could not realise the mistake .

Thank you helping me .

Regards,

mspak

Quartz | Level 8

## Re: Comparison between largest and second largest value

No problem ... glad it helped!

EJ

Quartz | Level 8

## Re: Comparison between largest and second largest value

Hi all,

I modified the codes as follows to generate my desired output:

proc sort data=final.test  out=test1 nodupkey; by fyear sic2 au; run;

proc sort data=test1 out=test2; by sic2 fyear ; run;

proc rank data=test2  descending out=test3;

by sic2 fyear ;

var nat_mkshr;

ranks rnk_nat;

run;

proc sort data=test3 out=test4;

by sic2 fyear descending rnk_nat; run;

data test5; set test4;

by sic2 fyear descending rnk_nat;

natshr_dif = dif(nat_mkshr);

IF FIRST.FYEAR THEN DO;

natshr_dif = .;

END;

run;

proc sort data=test5 out=test6; by sic2 fyear cbsa; run;

proc rank data=test6  descending out=test7;

by sic2 fyear cbsa;

var City_Mkshr;

ranks rnk_city;

run;

proc sort data=test7 out=test8;

by sic2 fyear cbsa descending rnk_city; run;

data test9; set test8 (drop=gvkey);

by sic2 fyear cbsa descending rnk_city;

cityshr_dif = dif(City_Mkshr);

IF FIRST.CBSA THEN DO;

cityshr_dif = .;

END;

run;

data test10; set test9;

if rnk_nat = 1 and natshr_dif>=0.1 then nat_AIP1 = 1; else nat_AIP1 = 0;

if rnk_city ^= 1 OR . <cityshr_dif <0.1  then city_AIP1 = 0; else city_AIP1 = 1;

run;

Thank you.