BookmarkSubscribeRSS Feed
sastuck
Pyrite | Level 9

My firm dataset: 

 

rowdate     ticker cusip                  annual_return
220171229AA1387210 91.844825974
320171229AABA2134610 28.212199886
420141231AAC30710 117.77973147

 

My CEO salary dataset:

 

rowexec_fullnameconame     ceoann    salary year    cusip
7Daniel P. AmosAFLAC INCCEO1441.12016105510
8Victor L. LundTERADATA CORPCEO526.776201688076W10
9Carol A. Bartz, Ph.D.ALTABA INCCEO100020102134610
10Carol A. Bartz, Ph.D.ALTABA INCCEO735.02520112134610
11Richard L. Carrin RexachPOPULAR INCCEO838.26201073317470
12Richard L. Carrin RexachPOPULAR INCCEO1316.282201173317470
13Richard L. Carrin RexachPOPULAR INCCEO1400201273317470
14Richard L. Carrin RexachPOPULAR INCCEO1400201373317470
15Richard L. Carrin RexachPOPULAR INCCEO1400201473317470

 

In the end I would like these variables in the same dataset:

 

Firm    CEO Salary     Annual Return     Year 

 

I am new to SAS and the last time I did a proc merge it was much simpler than this. What suggestions do you have for the best way to merge these datasets?

 

Thanks so much!

 
 
21 REPLIES 21
Astounding
PROC Star

The first decision to  make is how they match up.  Is matching on CUSIP enough?  Do you need to get the YEAR involved as well?  There's no programming at this point, just decisions as to what constitutes a match.

sastuck
Pyrite | Level 9
I am looking into the effect of firm performance on CEO compensation, so I feel like the year is essential because the return in that particular year needs to be matched to that year’s CEO’s salary so that I can study the relationship between these two variables
ChrisBrooks
Ammonite | Level 13

Just for the record as well there is no Proc Merge - you can use a MERGE statement or an SQL JOIN. In practice the decision about which to choose often seems to boil down to personal preference more than anything else.

Astounding
PROC Star

So would it be a match to compare the CUSIP and YEAR from the CEO salary data set, vs. the CUSIP and the year portion of the date from the FIRM data set?

sastuck
Pyrite | Level 9

I'm not sure. What do you suggest? The important thing is that I have the following in the same data set:

 

Firm    CEO Salary     Annual Return     Year 

Astounding
PROC Star

That's really something you have to decide before programming can begin.  You're the one looking at the data, and you're the one who is familiar with what the fields mean.  If you don't know which observations should match and which shouldn't, there's no way that SAS can help.

sastuck
Pyrite | Level 9

I want to merge by year. But my CSRP dataset doesn't have this variable. How can I use the ret variable (which includes the year) to make this new "year" variable for each observation? 

 

This is what the ret variable looks like:

 

20161230.Ret133341387210AAALCOA CORP AA331313NAR557180133341387210.    ..0  -3.0721
21634720171229.11133341387210AAALCOA CORP AA331313NAR557180133341387210.0 1 ..0  91.844825974
31675220171229.14373752134610AABAALTABA INC AABA541512QAR5596419523373752134610.0 1 ..0  28.212199886
41494520141231.111679930710AACA A C HOLDINGS AAC523910NAR5503801679930710.    ..0  117.77973147
51494520151231.111679930710AACA A C HOLDINGS AAC523910NAR5503801679930710.    ..0  -38.35712285
61494520161230.111679930710AACA A C HOLDINGS AAC523910NAR5503801679930710.    ..0  -62.01467799
71494520171229.111679930710AACA A C HOLDINGS AAC523910NAR5503801679930710.0 1 ..0  24.309233988
81558020101231.11363204820910AAMEATLANTIC AMERIC AAME524113QAR55363204820910.    ..0  58.593603499
91558020111230.11363204820910AAMEATLANTIC AMERIC AAME524113QAR55363204820910.    ..0  -2.061186337
101558020121231.11363204820910AAMEATLANTIC AMERIC AAME524113QAR5536320482091020121204    20121227201212140  60.498060166
111558020131231.11363204820910AAMEATLANTIC AMERIC AAME524113QAR55363204820910.    ..0  33.157366168
121558020141231.11363204820910AAMEATLANTIC AMERIC AAME524113QAR55363204820910.    ..0  -0.41107092
131558020151231.11363204820910AAMEATLANTIC AMERIC AAME524113QAR55363204820910.    ..0  24.446561041
141558020161230.11363204820910AAMEATLANTIC AMERIC AAME524113QAR55363204820910.    ..0  -17.44246098
151558020171229.11363204820910AAMEATLANTIC AMERIC AAME524113QAR55363204820910.0 1 ..0  -16.653317
161051720101231201205081117359253530AANAARONS INC AAN443111NAR5674701617359253530.    ..0  47.612254611
171051720111230.1117359253530AANAARONS INC AAN443111NAR5674701617359253530.    ..0  31.114473349
181051720121231.1117359253530AANAARONS INC AAN443141NAR5674701617359253530.    ..0  6.2307808656
191051720131231
Astounding
PROC Star

That one's fairly straightforward.  Since your RET values are numeric:

 

year = int( ret/10000 );

sastuck
Pyrite | Level 9

I have my new variable year--thanks for the help!

 

now I am trying to merge by year:

 

* 3. Merge CEO data and firm data by year in a data step; 
DATA ceo_firm ; 
  MERGE ceos_sorted firms_sorted; 
  BY year; 
RUN; 

but I am getting an error message: "variable CUSIP has been defined as both character and numeric. Isn't this odd given the fact that I was only trying to merge by year? Thanks!

Cynthia_sas
SAS Super FREQ
Hi:
SAS builds the new dataset from ALL the variables in ALL the datasets being combined. So when it finds a mismatch in the type for any of the variables you will have in the new dataset, it will complain.

Do a PROC CONTENTS on both your files -- ceos_sorted and firms_sorted and double check the CUSIP variable. My guess is that SAS is correct and you have a type mismatch that you need to fix.

cynthia
Astounding
PROC Star

Think of it this way.  There's only one CUSIP variable being saved.  So SAS needs to decide before bringing in any data whether CUSIP should be character or numeric.  It's getting mixed messages when it examines the incoming data sets.

 

Once again, your decision comes before any programming begins.  CUSIP needs to be defined consistently in the two data sets.  Would you like it to be character or numeric?  Either is possible (and easy) to accomplish.

sastuck
Pyrite | Level 9

I suppose its arbitrary really. How do I just make both numeric?

Astounding
PROC Star

As @Cynthia_sas mentioned, you need to run PROC CONTENTS so you know which data set contains the character version.  For illustration purposes, assume it's FIRMS_SORTED.  Then you could run:

 

data firms_sorted_new;

set firms_sorted (rename=(cusip=temp));

cusip = input(temp, 12.);

drop temp;

run;

 

That assumes that CUSIP never exceeds 12 characters originally.

sastuck
Pyrite | Level 9

@Astounding, Good Afternoon,

 

I have run into some issues again in merging my datasets. One of the issues is that I want only CEOs in the merged dataset, and this community helped me find code to do that, but, even with this code, there remain other executives in my dataset. Do you know why this isn't working? Here's what my dataset and code look like:

 


row exec_fullname               coname    cfoann     salaryyeardateret                      annual_return
2Timothy J. RomeneskoAAR CORP 468.182010201012310.02359447.612254611
3Richard J. PoultonAAR CORP 367.22010201012310.09980546.882111929
4Terry D. StinsonAAR CORP 338.132010201012310.0366753.066450461
5Robert J. ReganAAR CORP 367.22010201012310.060538350.47620897
6Robert E. SwitzADC TELECOMMUNICATIONS INCCEO767.8312010201012310.02327749.879140893
7Patrick D. O'BrienADC TELECOMMUNICATIONS INC 349.9362010201012310.05630416.220864139
8Jeffrey D. PflaumADC TELECOMMUNICATIONS INC 310.378201020101231-0.0055565.0879452755
9James G. MathewsADC TELECOMMUNICATIONS INC 344.8652010201012310.15960720.357696705
10Kimberly HartwellADC TELECOMMUNICATIONS INC 310.3782010201012310.10599732.401213598
11Gerard J. ArpeyAMERICAN AIRLINES GROUP INCCEO669.6462010201012310.36363690.475977378
12Daniel P. GartonAMERICAN AIRLINES GROUP INC 530.4782010201012310.17407960.277798513
13Gary F. Kennedy, Esq.AMERICAN AIRLINES GROUP INC 502.5432010201010290.489189-0.421830073
14Robert William RedingAMERICAN AIRLINES GROUP INC 530.4792010201012310.191871290.00037099
15Thomas W. HortonAMERICAN AIRLINES GROUP INC 618.1352010201012310.14198930.597781935
16Isabella D. GorenAMERICAN AIRLINES GROUP INC 423.993201020101231-0.16289210.080476093

 

libname paper "~/425/425_Final_Paper";

*import compustat dataset;
PROC IMPORT out=paper.CompuStat_Execucomp
	datafile="~/425/425_Final_Paper/CompuStat_Execucomp.csv"
	DBMS=CSV replace;
	getnames= yes;
	guessingrows=2000;
run;

*keep only CEOs;
data CEO;
	set paper.Compustat_ExecuComp;
 	if CEOANN = 'CEO' then output CEO;
run;

*import csrp dataset;
PROC IMPORT out=paper.CSRP_Monthly_Stock
	datafile="~/425/425_Final_Paper/CSRP_MonthlyStock.csv"
	DBMS=CSV replace;
	getnames= yes;
/*guessingrows=max;*/
run;

*delete rows with missing cusips;
data paper.CSRP_Monthly_Stock;
	set paper.CSRP_Monthly_Stock;
	if nmiss(cusip) > 0 then delete;
run;

*create new variable year;
data paper.CSRP_Monthly_Stock;
	set paper.CSRP_Monthly_Stock;
	year = int( date/10000 );
run;

*delete rows with missing return data;
data paper.CSRP_Monthly_Stock;
	set paper.CSRP_Monthly_Stock;
	if nmiss(ret) > 0 then delete;
run;

*sort by ticker;
proc sort 
	data=paper.CSRP_Monthly_Stock;
	by ticker date;
run;

data paper.CSRP_Monthly_Stock;
	set paper.CSRP_Monthly_Stock;
	by ticker date;
	retain annual_return 1;
	annual_return = annual_return * (1 + RET);
	month = int( mod(date,10000) / 100);
	if month = 12 or last.ticker;
	annual_return = (annual_return - 1) * 100;
	output;
	annual_return = 1;
run;

/*merging*/

/*keep only CEOs*/
data CEO;
	set paper.Compustat_ExecuComp;
 	if CEOANN = 'CEO' then output CEO;
run;

* 1. Sort CEO data by cusip and year & save sorted file as ceo_sorted ; 
PROC SORT DATA=paper.Compustat_ExecuComp OUT=ceos_sorted; 
  BY year; 
RUN; 

* 2. Sort firm data by cusip and year & save sorted file as firms_sorted ; 
PROC SORT DATA=paper.CSRP_Monthly_Stock OUT=firms_sorted; 
  BY year; 
RUN; 

libname paper "~/425/425_Final_Paper";

/*keep only CEOs*/
data CEO;
	set paper.Compustat_ExecuComp;
 	if CEOANN = 'CEO' then output CEO;
run;

* 3. Merge CEO data and firm data by year in a data step; 
DATA ceo_firm ; 
  MERGE ceos_sorted_new firms_sorted; 
  BY year; 
RUN; 

/*use proc contents to see if there is a type mismatch*/
proc contents 
 	data=paper.CSRP_Monthly_Stock;
run;

proc contents 
 	data=paper.Compustat_ExecuComp;
run;

* change cusip to numeric variable; 
data ceos_sorted_new;
set ceos_sorted (rename=(cusip=temp));
cusip = input(temp, 12.);
drop temp;
run;

Thanks for the help!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1762 views
  • 6 likes
  • 4 in conversation