Help using Base SAS procedures

Comparison between largest and second largest value

Reply
Regular Contributor
Posts: 162

Comparison between largest and second largest value

Good days to all,

I have a set of data (as per attached), with the following variables:

gvkey = US company identification code

fyear = financial year

Auditor = auditor identification code

sic2 = 2-digit industry classification code

CBSA = City

NAT_Mkshr = National market share, based on SIC2 and fyear. 

City_Mkshr = City market share, based on CBSA, SIC2 and fyear.

I would like to calculate Auditor Industry Specialist, based on the following definitions:

  • National auditor specialist: An auditor is defined as a national industry specialist if it has the largest annual (ie fyear) market share in an industry, based on the two-digit SIC category, and if its annual market share is at least 10 percentage points greater than its closest competitor in a national  audit market.
  • City auditor specialist: An auditor is defined as a city industry specialist if it has the largest annual (ie fyear) market share in an industry, based on the two-digit SIC category for a given city (ie. CBSA), and if its annual market share is at least 10 percentage points greater than its closest competitor in a city audit market.

I learned from my search that there are SAS function, namely LARGEST and COMPARE. I have no idea on whether these functions can be applied in my case.

Hope to get any simple code to my problem.

Thanks in advance for any help.

Regards,

mspak

Attachment
Super Contributor
Posts: 334

Re: Comparison between largest and second largest value

See if this can get you started. Proc rank will rank the observations by fyear, au, sic2. The rank variable is in mk_nat.

proc sort data=test out=test2; by fyear sic2 au; run;

proc rank data=test2 out=test3;

  by fyear sic2 au;

  var nat_mkshr;

  ranks rnk_nat;

run;

As far as the rest there are some other rules you will have to work through like market share ties. But if you keep everyone with a rank < 3 those should be all the people you need to look at and run through a data step to flag the specialists.

Hope this helps!

EJ

Super Contributor
Posts: 334

Re: Comparison between largest and second largest value

By the way the functions you mentioned I dont think will help since they would work across variables and not across observations.

EJ

Regular Contributor
Posts: 162

Re: Comparison between largest and second largest value

Hi EJ,

Thanks for your helps.

I try to run the following code:

proc sort data=final.test  out=test1 nodupkey; by fyear sic2 au; run;

proc sort data=test1 out=test2; by sic2 fyear ; run;

proc rank data=test2  descending out=test3;

  by sic2 fyear ;

  var nat_mkshr;

  ranks rnk_nat;

run;

proc sort data=test3 out=test4;

by sic2 fyear descending rnk_nat; run;

data test5; set test4;

by sic2 fyear descending rnk_nat;

natshr_dif = dif(nat_mkshr); run;

proc sort data=test5 out=test6; by sic2 fyear cbsa; run;

proc rank data=test6  descending out=test7;

  by sic2 fyear cbsa;

  var City_Mkshr;

  ranks rnk_city;

run;

proc sort data=test7 out=test8;

by sic2 fyear cbsa descending rnk_city; run;

data test9; set test8;

by sic2 fyear cbsa descending rnk_nat;

cityshr_dif = dif(City_Mkshr); run;

I found everything is within my expectation up to the output of test8.

I found the following error whenever I try to output "test9". I found that the obs in test9 has drop significantly compared to as in test8.

ERROR: BY variables are not properly sorted on data set WORK.TEST8.

Thank you.

Regards,

mspak

Do you know the reason?

Trusted Advisor
Posts: 1,137

Re: Comparison between largest and second largest value

Hi mspak,

in proc sort for test8 you have used the by variables as sic2 fyear cbsa descending rnk_city;

However while creating the dataset test9, the by variable order is sic2 fyear cbsa descending rnk_nat;

Here, please replace rnk_nat with rnk_city and you will not get any error.

Thanks,

Jagadish

Thanks,
Jag
Super Contributor
Posts: 334

Re: Comparison between largest and second largest value

Posted in reply to Jagadishkatam

Good catch. I noticed the same thing.

Also I always forget about the dif function so it was good to see that was what you are using.

EJ

Regular Contributor
Posts: 162

Re: Comparison between largest and second largest value

Posted in reply to Jagadishkatam

Hi Jagadish and EJ,

There was a typo error on the variable (rnk_city instead of rnk_nat). I checked the code for a few times, but I could not realise the mistake Smiley Sad.

Thank you helping me Smiley Happy.

Regards,

mspak

Super Contributor
Posts: 334

Re: Comparison between largest and second largest value

No problem ... glad it helped!

EJ

Regular Contributor
Posts: 162

Re: Comparison between largest and second largest value

Hi all,

I modified the codes as follows to generate my desired output:

proc sort data=final.test  out=test1 nodupkey; by fyear sic2 au; run;

proc sort data=test1 out=test2; by sic2 fyear ; run;

proc rank data=test2  descending out=test3;

  by sic2 fyear ;

  var nat_mkshr;

  ranks rnk_nat;

run;

proc sort data=test3 out=test4;

by sic2 fyear descending rnk_nat; run;

data test5; set test4;

by sic2 fyear descending rnk_nat;

natshr_dif = dif(nat_mkshr);

IF FIRST.FYEAR THEN DO;

natshr_dif = .;

END;

run;

proc sort data=test5 out=test6; by sic2 fyear cbsa; run;

proc rank data=test6  descending out=test7;

  by sic2 fyear cbsa;

  var City_Mkshr;

  ranks rnk_city;

run;

proc sort data=test7 out=test8;

by sic2 fyear cbsa descending rnk_city; run;

data test9; set test8 (drop=gvkey);

by sic2 fyear cbsa descending rnk_city;

cityshr_dif = dif(City_Mkshr);

IF FIRST.CBSA THEN DO;

cityshr_dif = .;

END;

run;

data test10; set test9;

if rnk_nat = 1 and natshr_dif>=0.1 then nat_AIP1 = 1; else nat_AIP1 = 0;

if rnk_city ^= 1 OR . <cityshr_dif <0.1  then city_AIP1 = 0; else city_AIP1 = 1;

run;

Thank you.

Regads,

mspak

Ask a Question
Discussion stats
  • 8 replies
  • 353 views
  • 7 likes
  • 3 in conversation