Re: Comparison between largest and second largest value

mspak · Posted 01-24-2013 07:52 AM

Good days to all,

I have a set of data (as per attached), with the following variables:

gvkey = US company identification code

fyear = financial year

Auditor = auditor identification code

sic2 = 2-digit industry classification code

CBSA = City

NAT_Mkshr = National market share, based on SIC2 and fyear.

City_Mkshr = City market share, based on CBSA, SIC2 and fyear.

I would like to calculate Auditor Industry Specialist, based on the following definitions:

National auditor specialist: An auditor is defined as a national industry specialist if it has the largest annual (ie fyear) market share in an industry, based on the two-digit SIC category, and if its annual market share is at least 10 percentage points greater than its closest competitor in a national audit market.
City auditor specialist: An auditor is defined as a city industry specialist if it has the largest annual (ie fyear) market share in an industry, based on the two-digit SIC category for a given city (ie. CBSA), and if its annual market share is at least 10 percentage points greater than its closest competitor in a city audit market.

I learned from my search that there are SAS function, namely LARGEST and COMPARE. I have no idea on whether these functions can be applied in my case.

Hope to get any simple code to my problem.

Thanks in advance for any help.

Regards,

mspak

esjackso · Posted 01-24-2013 08:20 AM

See if this can get you started. Proc rank will rank the observations by fyear, au, sic2. The rank variable is in mk_nat.

proc sort data=test out=test2; by fyear sic2 au; run;

proc rank data=test2 out=test3;

by fyear sic2 au;

var nat_mkshr;

ranks rnk_nat;

run;

As far as the rest there are some other rules you will have to work through like market share ties. But if you keep everyone with a rank < 3 those should be all the people you need to look at and run through a data step to flag the specialists.

Hope this helps!

EJ

esjackso · Posted 01-24-2013 08:22 AM

By the way the functions you mentioned I dont think will help since they would work across variables and not across observations.

EJ

mspak · Posted 01-24-2013 09:05 AM

Hi EJ,

Thanks for your helps.

I try to run the following code:

proc sort data=final.test out=test1 nodupkey; by fyear sic2 au; run;

proc sort data=test1 out=test2; by sic2 fyear ; run;

proc rank data=test2 descending out=test3;

by sic2 fyear ;

var nat_mkshr;

ranks rnk_nat;

run;

proc sort data=test3 out=test4;

by sic2 fyear descending rnk_nat; run;

data test5; set test4;

by sic2 fyear descending rnk_nat;

natshr_dif = dif(nat_mkshr); run;

proc sort data=test5 out=test6; by sic2 fyear cbsa; run;

proc rank data=test6 descending out=test7;

by sic2 fyear cbsa;

var City_Mkshr;

ranks rnk_city;

run;

proc sort data=test7 out=test8;

by sic2 fyear cbsa descending rnk_city; run;

data test9; set test8;

by sic2 fyear cbsa descending rnk_nat;

cityshr_dif = dif(City_Mkshr); run;

I found everything is within my expectation up to the output of test8.

I found the following error whenever I try to output "test9". I found that the obs in test9 has drop significantly compared to as in test8.

ERROR: BY variables are not properly sorted on data set WORK.TEST8.

Thank you.

Regards,

mspak

Do you know the reason?

Jagadishkatam · Posted 01-24-2013 09:13 AM

Hi mspak,

in proc sort for test8 you have used the by variables as sic2 fyear cbsa descending rnk_city;

However while creating the dataset test9, the by variable order is sic2 fyear cbsa descending rnk_nat;

Here, please replace rnk_nat with rnk_city and you will not get any error.

Thanks,

Jagadish

Thanks,
Jag

esjackso · Posted 01-24-2013 09:15 AM

Good catch. I noticed the same thing.

Also I always forget about the dif function so it was good to see that was what you are using.

EJ

mspak · Posted 01-24-2013 09:23 AM

Hi Jagadish and EJ,

There was a typo error on the variable (rnk_city instead of rnk_nat). I checked the code for a few times, but I could not realise the mistake .

Thank you helping me .

Regards,

mspak

esjackso · Posted 01-24-2013 09:28 AM

No problem ... glad it helped!

EJ

mspak · Posted 01-24-2013 10:02 AM

Hi all,

I modified the codes as follows to generate my desired output:

proc sort data=final.test out=test1 nodupkey; by fyear sic2 au; run;

proc sort data=test1 out=test2; by sic2 fyear ; run;

proc rank data=test2 descending out=test3;

by sic2 fyear ;

var nat_mkshr;

ranks rnk_nat;

run;

proc sort data=test3 out=test4;

by sic2 fyear descending rnk_nat; run;

data test5; set test4;

by sic2 fyear descending rnk_nat;

natshr_dif = dif(nat_mkshr);

IF FIRST.FYEAR THEN DO;

natshr_dif = .;

END;

run;

proc sort data=test5 out=test6; by sic2 fyear cbsa; run;

proc rank data=test6 descending out=test7;

by sic2 fyear cbsa;

var City_Mkshr;

ranks rnk_city;

run;

proc sort data=test7 out=test8;

by sic2 fyear cbsa descending rnk_city; run;

data test9; set test8 (drop=gvkey);

by sic2 fyear cbsa descending rnk_city;

cityshr_dif = dif(City_Mkshr);

IF FIRST.CBSA THEN DO;

cityshr_dif = .;

END;

run;

data test10; set test9;

if rnk_nat = 1 and natshr_dif>=0.1 then nat_AIP1 = 1; else nat_AIP1 = 0;

if rnk_city ^= 1 OR . <cityshr_dif <0.1 then city_AIP1 = 0; else city_AIP1 = 1;

run;

Thank you.

Regads,

mspak

SAS Training: Just a Click Away