BookmarkSubscribeRSS Feed
sastuck
Pyrite | Level 9

Hello,

 

I am modeling the effect of firm performance on CEO salary. SAS read 10,741 observations from my merged ceo_firm data set, but only used 2,066 of them. 8,675 observations had missing values. I am surprised by this, however, due to the substantive data cleaning I have done in preparation for running the regression. Do you have any idea why I could still be missing so many observations? Here is my code:

 

libname paper "~/425/425_Final_Paper";

*import compustat dataset;
PROC IMPORT out=paper.CompuStat_Execucomp
datafile="~/425/425_Final_Paper/CompuStat_Execucomp.csv"
DBMS=CSV replace;
getnames= yes;
guessingrows=2000;
run;

*keep only CEOs;
data paper.Compustat_ExecuComp2;
set paper.Compustat_ExecuComp;
if CEOANN = 'CEO';
run;

*** extra DATA step for checking previous data step results by comparing number of records selected;
data check_CEOANN;
set paper.Compustat_ExecuComp;
if CEOANN =: 'CEO';
run;

proc sort data=paper.Compustat_ExecuComp2;
by ticker year;
run;

*import csrp dataset;
PROC IMPORT out=paper.CSRP_Monthly_Stock_char
datafile="~/425/425_Final_Paper/CSRP_MonthlyStock_char.csv"
DBMS=CSV replace;
getnames= yes;
guessingrows=max;
run;

*remove bad data; 
data paper.CSRP_Monthly_Stock_char2;
set paper.CSRP_Monthly_Stock_char (rename=(ret=character_ret));
drop character_ret;
if cusip = ' ' then delete;
ret = input(character_ret, ??8.);
if ret = . then delete;
date = input(put(date, z8.), yymmdd8.);
format date yymmdd10.;
year = year(date);
month = month(date);
if cusip =: '?' then cusip = substr(cusip, 2);
run;

proc contents 
data=paper.CSRP_Monthly_Stock_char2;
run;

proc sort data=paper.CSRP_Monthly_Stock_char2;
by ticker year;
run;

*Remove all bad years from both data sources;
data paper.Compustat_ExecuComp3;
merge paper.Compustat_ExecuComp2
paper.multiple_CEOs (keep=ticker year in=had_multiple_CEOs);
by ticker year;
if had_multiple_CEOs then delete;
run;

data paper.CSRP_Monthly_Stock_char3;
merge paper.CSRP_Monthly_Stock_char2
paper.multiple_CEOs (keep=ticker year in=had_multiple_CEOs);
by ticker year;
if had_multiple_CEOs then delete;
run;

*find additional bad data: multiple return records for the same month/year;
proc freq data=paper.CSRP_Monthly_Stock_char3;
tables ticker * year * month / noprint out=paper.multiple_returns (where=(count > 1));
run;

*Remove all matching year data for multiple returns;
proc sort data=paper.multiple_returns out=multiple_returns (keep=ticker year) NODUPKEY;
by ticker year;
run;

data paper.Compustat_ExecuComp4;
merge paper.Compustat_ExecuComp3
multiple_returns (in=had_multiple_returns);
by ticker year;
if had_multiple_returns then delete;
run;

data paper.CSRP_Monthly_Stock_char4;
merge paper.CSRP_Monthly_Stock_char3
multiple_returns (in=had_multiple_returns);
by ticker year;
if had_multiple_returns then delete;
run;

*create new variable annualized growth;
data paper.CSRP_annual_returns;
set paper.CSRP_Monthly_Stock_char4;
by ticker year;
retain annual_return 1;
annual_return = annual_return * (1 + RET);
if month = 12 or last.ticker;
annual_return = (annual_return - 1) * 100;
output;
annual_return = 1;
keep ticker year annual_return;
run;

*use proc contents to see if there is a type mismatch;
proc contents 
data=paper.CSRP_annual_returns;
run;

proc contents 
data=paper.Compustat_ExecuComp4;
run;

*MERGE statement has more than one data set with repeats of BY values;
proc sort data=paper.compustat_execucomp4;
  by ticker;
run;
proc sort data=paper.CSRP_annual_returns nodupkey;
  by ticker;
run;
data want;
  merge paper.compustat_execucomp4 (in=in1) paper.CSRP_annual_returns;
  by ticker;
  if in1;
run;

*Merge CEO data and firm data; 
DATA paper.ceo_firm ; 
length ticker $5;
MERGE paper.CSRP_annual_returns 
paper.compustat_execucomp4; 
BY ticker; 
RUN;

proc contents 
data=paper.ceo_firm;
run;

*remove missing return data; 
data paper.ceo_firm;
set paper.ceo_firm;
if annual_return = . then delete;
run;

proc means data=paper.CSRP_Monthly_Stock_char3   n nmiss;
var ret;
run;

*Use OLS to estimate model;
title "";
ods graphics on;
proc reg data=paper.ceo_firm plots(maxpoints=none);
model salary = annual_return /clb acov;
run;  
ods graphics off;

 

Thanks in advance for the help! 

17 REPLIES 17
art297
Opal | Level 21

Without seeing the data I don't think anyone can tell you why you are missing salary and/or annual return from so many records.

 

I think you have to manually review the files to see why there are so many missing values. Could be an error in how the data was imported.

 

Art, CEO, AnalystFinder.com

 

 

sastuck
Pyrite | Level 9

Which particular dataset do you think would be helpful in looking at this issue? 

Rick_SAS
SAS Super FREQ

Post the results of the PROC MEANS call

proc means data=paper.CSRP_Monthly_Stock_char3   n nmiss;
var salary annual_return;
run;

 

. It contains the count of missing and nonmissing values for the variables in your model.

sastuck
Pyrite | Level 9

From 

proc means data=paper.Compustat_ExecuComp4   n nmiss;
var salary;
run;

The MEANS Procedure

 Analysis Variable : SALARYN N Miss

133460

 

and from:

 

proc means data=paper.CSRP_annual_returns   n nmiss;
var annual_return;
run;

we get

 

The MEANS Procedure

 Analysis Variable : annual_returnN N Miss

531760
art297
Opal | Level 21

Didn't you really want to run those proc means on paper.ceo_firm since that is the file you're using in proc reg?

 

proc reg data=paper.ceo_firm plots(maxpoints=none);
model salary = annual_return /clb acov;
run;

And you haven't (I don't think) addressed the question I raised in my previous post.

 

Art, CEO, AnalystFinder.com

 

 

art297
Opal | Level 21

I agree, in principle, with Rick, but I don't think RET is the variable you're using. Your code includes:

data paper.CSRP_annual_returns;
  set paper.CSRP_Monthly_Stock_char4;
  by ticker year;
  retain annual_return 1;
  annual_return = annual_return * (1 + RET);
  if month = 12 or last.ticker;
  annual_return = (annual_return - 1) * 100;
  output;
  annual_return = 1;
  keep ticker year annual_return;
run;

In that code you're only calculating annual_return if month=12 or it's the last record for a particular ticker. Is that really what you want to be doing? And your proc means only analyzes RET, while your variable is annual_return.

 

I don't know which file contains SALARY

 

Art, CEO, AnalystFinder.com

 

 

Art, CEO,

sastuck
Pyrite | Level 9

Salary is in Compustat_ExecuComp4. 

 

And with regards to annual_return, I will quote a SAS expert from another post:

 

Do you always begin with January and end with December?

 

Is DATE a SAS date, or merely a numeric value with YMD?

 

Is your data in sorted order by TICKER DATE?

 

Assuming "yes", "just a number", and "yes", you could program it in this way:

 

 

 

I will look into your other points now. 

 

data temp;

set have;

by ticker date;

retain annual_return 1;

annual_return = annual_return * (1 + RET);

month = int( mod(date,10000) / 100);

if month = 12;

annual_return = (annual_return - 1) * 100;

output;

annual_return = 1;

run;

 

If DATE is actually a SAS date, the calculation of MONTH is easier:

 

month = month(date);

 

Note that the annualized return is calculated separately for each year.  If you get 10% for one year, and 20% for the next year, the total value after two years would be 1.1 * 1.2 * original value ... all assuming that I got the formulas correct.

art297
Opal | Level 21

I don't know who provided that code or whether or not it's doing what you want. Most of us on this forum are users .. just like you. We don't get paid, may sometimes suggest wrong answers, and often aren't provided with enough details to make a suggestion that actually does what the requester wanted to do (but didn't convey the correct request).

 

As for myself, I wouldn't necessarily call myself an expert, but I do have a PhD and have been using SAS for the past 45 years.

 

Make up an example for two years of one ticker and, for each record, show what you expect the value of annual_return to be.

 

As for Salary, it has to be in the file you are analyzing with proc reg. Earlier you had asked which file to look at to identify why you had missing values .. I think it may be the comparison of those two files and, if they're different, figuring out why they're different.

 

Art, CEO, AnalystFinder.com

 

P.S. I just looked up the original post and, if your data are in fact monthly values, then the code that @Astounding suggested does indeed do the calculation correctly. However, I also noticed in that thread that you were trying obtain the geometric mean, but didn't see @mkeintz's suggestion incorporated in your code.

 

However, I also noticed in that thread, that you were getting missing values back then. Was that problem ever corrected?

 

sastuck
Pyrite | Level 9

A) I am going to forgo the geometric mean for now. I am just looking for percentage growth over the course of a year. Here’s an example of what I would expect, based off numbers from yahoo finance:

 

Ticker    year      annual_return

AAOI     2014      61.35%                (my data actually says its -35.15776393)

ACFC    2013       115.42%             (my data actually says its 125.9565%)

 

B) I used:

 

proc contents

data=paper.ceo_firm;

run;

 

to confirm that salary is in the data set I am analyzing with proc reg

 

Is this what you were wondering?

 

 

C) I will find that thread and investigate my missing values further.

 

@art297 any other suggestions?

art297
Opal | Level 21

From what I've read in previous posts on this and related topics I'd be most interested in seeing what data was missing and, if it the results didn't seem reasonable, I'd investigate further.

 

You might want to run something like:

 

proc format;
value sal
.='Missing'
low-4999='<5000'
5000-39999='< 40000'
40000-high='40000 or more'
;
value return
.='missing'
low-.299='< .3'
.3-.499='<.5'
.5-high='.5 or more'
;
run;

proc freq data=paper.ceo_firm;
tables year*(annual_return salary)/ missing;
format salary sal.;
format annual_return return.;
run;

 

Of course, you might want to select better ranges for the formats.

 

Art, CEO, AnalystFinder.com

sastuck
Pyrite | Level 9

Here is what the output looks like:

 

Table of year by annual_returnyear annual_returnmissing < .3 <.5 .5 or more Total20102011201220132014201520162017Total
156
0.72
2.33
22.41
1843
8.45
27.53
29.12
26
0.12
0.39
32.91
4670
21.42
69.75
31.76
6695
30.70
 
 
137
0.63
5.03
19.68
892
4.09
32.78
14.09
8
0.04
0.29
10.13
1684
7.72
61.89
11.45
2721
12.48
 
 
114
0.52
4.84
16.38
594
2.72
25.24
9.38
8
0.04
0.34
10.13
1637
7.51
69.57
11.13
2353
10.79
 
 
97
0.44
4.00
13.94
608
2.79
25.06
9.61
10
0.05
0.41
12.66
1711
7.85
70.53
11.64
2426
11.12
 
 
86
0.39
3.44
12.36
781
3.58
31.27
12.34
8
0.04
0.32
10.13
1623
7.44
64.97
11.04
2498
11.46
 
 
64
0.29
2.67
9.20
803
3.68
33.44
12.69
10
0.05
0.42
12.66
1524
6.99
63.47
10.37
2401
11.01
 
 
42
0.19
1.97
6.03
564
2.59
26.42
8.91
9
0.04
0.42
11.39
1520
6.97
71.19
10.34
2135
9.79
 
 
0
0.00
0.00
0.00
245
1.12
42.39
3.87
0
0.00
0.00
0.00
333
1.53
57.61
2.26
578
2.65
 
 
696
3.19
6330
29.03
79
0.36
14702
67.42
21807
100.00

 

Frequency
Percent
Row Pct
Col Pct
Table of year by SALARYyear SALARYMissing <5000 < 40000 Total20102011201220132014201520162017Total
4665
21.39
69.68
55.14
2029
9.30
30.31
15.21
1
0.00
0.01
20.00
6695
30.70
 
 
726
3.33
26.68
8.58
1994
9.14
73.28
14.95
1
0.00
0.04
20.00
2721
12.48
 
 
425
1.95
18.06
5.02
1927
8.84
81.90
14.44
1
0.00
0.04
20.00
2353
10.79
 
 
502
2.30
20.69
5.93
1923
8.82
79.27
14.41
1
0.00
0.04
20.00
2426
11.12
 
 
622
2.85
24.90
7.35
1875
8.60
75.06
14.05
1
0.00
0.04
20.00
2498
11.46
 
 
596
2.73
24.82
7.04
1805
8.28
75.18
13.53
0
0.00
0.00
0.00
2401
11.01
 
 
421
1.93
19.72
4.98
1714
7.86
80.28
12.85
0
0.00
0.00
0.00
2135
9.79
 
 
504
2.31
87.20
5.96
74
0.34
12.80
0.55
0
0.00
0.00
0.00
578
2.65
 
 
8461
38.80
13341
61.18
5
0.02
21807
100.00

 

Do you see anything systematic going on here?

art297
Opal | Level 21

The tables posted in a garbled fashion where one can't tell what it represents. It would, in this case, be better to post a snapshot of the result screens and post them as pdf files.

 

Art, CEO, AnalystFinder.com

 

 

 

sastuck
Pyrite | Level 9

@art297, my bad. I will take note of that for next time. It's just that I have been told more than a few times that people do not prefer to have to open attached files on this forum is all. 

art297
Opal | Level 21

People on the forum don't like pictures of datasets .. they'd rather have data posted in the form of an analyzable data step.

 

Similarly, they don't like opening file type that might contain viruses, like Word or Excel.

 

However, when showing one's output, a picture (i.e., jpg, pdf or the like) I'd think would be welcomed by most.

 

Art, CEO, AnalystFinder.com

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 3108 views
  • 1 like
  • 4 in conversation