BookmarkSubscribeRSS Feed
lioradam
Obsidian | Level 7

Hello all,

My data refers to CEO tenures  (average tenure length is 8 years). This means that my sample includes firm-year observations regarding several measures of net profit (accounting return), and those observations are sorted by CEO  tenures.

I wrote a code to eliminate outliers of the variables "net profit" in the 1% and 99% (program 1). Then I wrote a second code that said that if there is an outlier during CEO tenure, all the other observations in the same tenure will be eliminated as well (program 2).

According to my understanding, those programs combined should reduce my sample by an average of 16%: 2% of the sample should be reduced in program 1, and as the average tenure length is 8 years, after program 2, the reduction of observations should be 16%.

 

However, the actual reduction is a third of the sample (twice what I expected).

Enclosed are the two codes (two programs).

Could you please advise if there are problems with the codes?

Thanks in advance,

Lior

 

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Many of us will not download attachments. Code should be posted as text in the window that appears when you click on the "running man" icon.

 

I do not follow your logic about how there is a 2% drop because of program one and an average tenure is 8 years, so this leads to an expected 16% drop in number of records. Nevertheless, this should be easy for you to trace in your data sets to see if the correct records are being deleted. Since we don't have your data, we can't do that tracing.

--
Paige Miller
lioradam
Obsidian | Level 7

Hi,

Since 2% of the observations (observations below 1% and above 99%) are reduced, if, in the worst-case scenario, all outliers belong to separate tenures (there are no two or more outliers in the same tenure) and since the average tenure length is 8 years, then the outliers should be around 16% (2% multiple by 8).

 

Enclose the code of the two programs:

Program 1:

 

options mprint;

 

data Final_without_outliers_YB;

  set ADJUSTED_VARIABLES;

  run;

 

/*title;*/

 

%macro change(var);

 

 

 proc means data= ADJUSTED_VARIABLES n nmiss mean std median q1 q3 p1 p99 min max T ProbT/* noprint*/ ;

  var &var;

  output out=temp_YB p1=p1 p99=p99;

 run;

 

 

data _NULL_;

  set temp_YB;

  call symputx("p1_YB",p1);

  call symputx("p99_YB",p99);

  run;

 

%put _user_;

 

 data Final_without_outliers_YB;

  set Final_without_outliers_YB;

 

   if &var < &p1_YB then &var = .;

   if &var > &p99_YB then &var = .;

 

run;

 

title 'Results after Omitting Outliers';

 

proc means data = Final_without_outliers_YB n nmiss mean std median q1 q3 p1 p99 min max T ProbT;

  var &var;

  run;

 

title;

 

%mend;

 

 

%change(Adj_m1_ebsi_to_assets_a)

%change(Adj_m1_ebwo_to_assets_a)

%change(ebsi_to_assets_a)

%change(ebwo_to_assets_a)

%change(ewem_a_conpustat)

%change(e_bxi_a)

%change(cash_operation_a)

%change(Adj_m1_ewem_a_compustat)

%change(Adj_m1_ewem_a_exe)

%change(Adj_m1_ewem_a_exe_5)

%change(Adj_m1_ewem_a_exe_10)

%change(Adj_m1_ewem_a_exe_15)

%change(Adj_m1_ewem_a_20_exe)

%change(Adj_m1_e_bxi_a)

%change(Adj_m1_cash_operation_a)

%change(tobin_q)

 

 

Program 2:

 

options mprint;

 

data FINAL_WITHOUT_OUTLIERS_YB01;

     set FINAL_WITHOUT_OUTLIERS_YB;

    

 

%macro change1(var);

 

     proc means data=FINAL_WITHOUT_OUTLIERS_YB01 noprint nway;

           var &var;

           class GVKEY01 Full_Name;

           output out=work.e n=n;

     run;

 

     data e_&var;

           set e;

 

           if _FREQ_ > n;

           diff_&var = _FREQ_ - n;

           drop _type_ _FREQ_ n;

     run;

 

     proc sort data = FINAL_WITHOUT_OUTLIERS_YB01 out = finel_WO_temp;

           by GVKEY01 Full_Name;

     run;

 

     proc sort data = e_&var out = e_temp_&var;

           by GVKEY01 Full_Name;

     run;

 

     data FINAL_WITHOUT_OUTLIERS_YB01;

           merge e_temp_&var finel_WO_temp;

           by GVKEY01 Full_Name;

 

           if diff_&var > 0 then

                &var = .;

           drop diff_&var; /* if chacking then omit line*/

     run;

 

     proc delete data = finel_WO_temp;

     run;

 

     proc delete data = e_temp_&var;

     run;

 

     proc delete data = e_&var;

     run;

 

     proc delete data = e;

     run;

 

%mend;

 

 

%change1(Adj_m1_ebsi_to_assets_a)

%change1(Adj_m1_ebwo_to_assets_a)

%change1(ebsi_to_assets_a)

%change1(ebwo_to_assets_a)

%change1(ewem_a_conpustat)

%change1(e_bxi_a)

%change1(cash_operation_a)

%change1(Adj_m1_ewem_a_compustat)

%change1(Adj_m1_ewem_a_exe)

%change1(Adj_m1_ewem_a_exe_5)

%change1(Adj_m1_ewem_a_exe_10)

%change1(Adj_m1_ewem_a_exe_15)

%change1(Adj_m1_ewem_a_20_exe)

%change1(Adj_m1_e_bxi_a)

%change1(Adj_m1_cash_operation_a)

%change1(tobin_q)

PaigeMiller
Diamond | Level 26

Repeating: Code should be posted as text in the window that appears when you click on the "running man" icon.

 

A tenure of 8 years average doesn't seem to me to relate to the 2% of the observations that are deemed to be outliers. I don't see how multiplying them together makes any sense here.

 

Did you try my suggestion to see what records have been deleted to determine if these are correct, or not?

--
Paige Miller
lioradam
Obsidian | Level 7

Hi,

Yes, I have checked what records have been deleted and the code is fine.

I analyzed the results in the wrong way.

 

Thank you,

Lior

Reeza
Super User

then the outliers should be around 16% (2% multiple by 8).

This is incorrect. You would expect somewhere between 2% and 8% as there would very likely be overlaps in the outliers.

lioradam
Obsidian | Level 7

Hi Reeza,

 You are right.

Anyway, I analyzed the data in the wrong way. Now I understand it and it makes sense.

 

Thank you,

Lior 

mkeintz
PROC Star

@lioradam wrote:

... stuff deleted ...

According to my understanding, those programs combined should reduce my sample by an average of 16%: 2% of the sample should be reduced in program 1, and as the average tenure length is 8 years, after program 2, the reduction of observations should be 16%.

 

However, the actual reduction is a third of the sample (twice what I expected).

Enclosed are the two codes (two programs).

I haven't looked at your code.  Please enter it in the popup box that will appear when you click on the "running man" glyph when posting a message to this forum.

 

But I wonder whether there is much justification for your expectation of 16% of observations to be removed by your outlier-year identification.

 

Did you generate accounting return cutoff points on a year-by-year basis, or globally (i.e. for all years)?  If they are global outliers, you will likely find more upper outliers in globally profitable years (i.e. years with high rates of return).  This means you are introducing a likely bias in your sample of outliers, since a non-outlier performer (i.e. an "ordinary" 95%ile) might exceed the  global 99%ile in that high profit year.  This could easily end up identifying more ceo-tenures above the 99%ile than you would it you used annually-adjusted distribution of returns to generate annual outlier cutoff values.

 

And the same argument might be applied to the lower cutoff point - the 1%iles.

 

And of course if your study covers a lot of years, then this bias mechanism could be exaggerated - the longer your study the more chance of an extremely good or bad year, i.e. more business cycles.

 

Also why do you assume that the average tenure length would be the same for ceo's at the extremes?  Wouldn't high performers likely stay longer?  Of course you could make the counter argument for low performers - they will have shorter-than-average tenures.  But I don't think you should assume that these tendencies are symmetrical - which your rationale suggests.

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
lioradam
Obsidian | Level 7

Hi,

I checked what records have been deleted and the code works fine. It was my mistake.

You raise good comments. I calculate the outliers for all years, however, the return is adjusted to industry return in each year.

I dill with the other comments you raise,  through different procedures in my research.

 

Thank you!

Lior

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1000 views
  • 0 likes
  • 4 in conversation