My firm dataset:
row | date | ticker | cusip | annual_return |
2 | 20171229 | AA | 1387210 | 91.844825974 |
3 | 20171229 | AABA | 2134610 | 28.212199886 |
4 | 20141231 | AAC | 30710 | 117.77973147 |
My CEO salary dataset:
row | exec_fullname | coname | ceoann | salary | year cusip |
7 | Daniel P. Amos | AFLAC INC | CEO | 1441.1 | 2016 | 105510 |
8 | Victor L. Lund | TERADATA CORP | CEO | 526.776 | 2016 | 88076W10 |
9 | Carol A. Bartz, Ph.D. | ALTABA INC | CEO | 1000 | 2010 | 2134610 |
10 | Carol A. Bartz, Ph.D. | ALTABA INC | CEO | 735.025 | 2011 | 2134610 |
11 | Richard L. Carrin Rexach | POPULAR INC | CEO | 838.26 | 2010 | 73317470 |
12 | Richard L. Carrin Rexach | POPULAR INC | CEO | 1316.282 | 2011 | 73317470 |
13 | Richard L. Carrin Rexach | POPULAR INC | CEO | 1400 | 2012 | 73317470 |
14 | Richard L. Carrin Rexach | POPULAR INC | CEO | 1400 | 2013 | 73317470 |
15 | Richard L. Carrin Rexach | POPULAR INC | CEO | 1400 | 2014 | 73317470 |
In the end I would like these variables in the same dataset:
Firm CEO Salary Annual Return Year
I am new to SAS and the last time I did a proc merge it was much simpler than this. What suggestions do you have for the best way to merge these datasets?
Thanks so much!
The first decision to make is how they match up. Is matching on CUSIP enough? Do you need to get the YEAR involved as well? There's no programming at this point, just decisions as to what constitutes a match.
Just for the record as well there is no Proc Merge - you can use a MERGE statement or an SQL JOIN. In practice the decision about which to choose often seems to boil down to personal preference more than anything else.
So would it be a match to compare the CUSIP and YEAR from the CEO salary data set, vs. the CUSIP and the year portion of the date from the FIRM data set?
I'm not sure. What do you suggest? The important thing is that I have the following in the same data set:
Firm CEO Salary Annual Return Year
That's really something you have to decide before programming can begin. You're the one looking at the data, and you're the one who is familiar with what the fields mean. If you don't know which observations should match and which shouldn't, there's no way that SAS can help.
I want to merge by year. But my CSRP dataset doesn't have this variable. How can I use the ret variable (which includes the year) to make this new "year" variable for each observation?
This is what the ret variable looks like:
20161230 | . | Ret | 1 | 3334 | 1387210 | AA | ALCOA CORP | AA | 331313 | N | A | R | 55718 | 0 | 1 | 3334 | 1387210 | . | . | . | 0 | -3.0721 |
2 | 16347 | 20171229 | . | 11 | 1 | 3334 | 1387210 | AA | ALCOA CORP | AA | 331313 | N | A | R | 55718 | 0 | 1 | 3334 | 1387210 | . | 0 | 1 | . | . | 0 | 91.844825974 |
3 | 16752 | 20171229 | . | 14 | 3 | 7375 | 2134610 | AABA | ALTABA INC | AABA | 541512 | Q | A | R | 55964 | 19523 | 3 | 7375 | 2134610 | . | 0 | 1 | . | . | 0 | 28.212199886 |
4 | 14945 | 20141231 | . | 11 | 1 | 6799 | 30710 | AAC | A A C HOLDINGS | AAC | 523910 | N | A | R | 55038 | 0 | 1 | 6799 | 30710 | . | . | . | 0 | 117.77973147 |
5 | 14945 | 20151231 | . | 11 | 1 | 6799 | 30710 | AAC | A A C HOLDINGS | AAC | 523910 | N | A | R | 55038 | 0 | 1 | 6799 | 30710 | . | . | . | 0 | -38.35712285 |
6 | 14945 | 20161230 | . | 11 | 1 | 6799 | 30710 | AAC | A A C HOLDINGS | AAC | 523910 | N | A | R | 55038 | 0 | 1 | 6799 | 30710 | . | . | . | 0 | -62.01467799 |
7 | 14945 | 20171229 | . | 11 | 1 | 6799 | 30710 | AAC | A A C HOLDINGS | AAC | 523910 | N | A | R | 55038 | 0 | 1 | 6799 | 30710 | . | 0 | 1 | . | . | 0 | 24.309233988 |
8 | 15580 | 20101231 | . | 11 | 3 | 6320 | 4820910 | AAME | ATLANTIC AMERIC | AAME | 524113 | Q | A | R | 5 | 5 | 3 | 6320 | 4820910 | . | . | . | 0 | 58.593603499 |
9 | 15580 | 20111230 | . | 11 | 3 | 6320 | 4820910 | AAME | ATLANTIC AMERIC | AAME | 524113 | Q | A | R | 5 | 5 | 3 | 6320 | 4820910 | . | . | . | 0 | -2.061186337 |
10 | 15580 | 20121231 | . | 11 | 3 | 6320 | 4820910 | AAME | ATLANTIC AMERIC | AAME | 524113 | Q | A | R | 5 | 5 | 3 | 6320 | 4820910 | 20121204 | 20121227 | 20121214 | 0 | 60.498060166 |
11 | 15580 | 20131231 | . | 11 | 3 | 6320 | 4820910 | AAME | ATLANTIC AMERIC | AAME | 524113 | Q | A | R | 5 | 5 | 3 | 6320 | 4820910 | . | . | . | 0 | 33.157366168 |
12 | 15580 | 20141231 | . | 11 | 3 | 6320 | 4820910 | AAME | ATLANTIC AMERIC | AAME | 524113 | Q | A | R | 5 | 5 | 3 | 6320 | 4820910 | . | . | . | 0 | -0.41107092 |
13 | 15580 | 20151231 | . | 11 | 3 | 6320 | 4820910 | AAME | ATLANTIC AMERIC | AAME | 524113 | Q | A | R | 5 | 5 | 3 | 6320 | 4820910 | . | . | . | 0 | 24.446561041 |
14 | 15580 | 20161230 | . | 11 | 3 | 6320 | 4820910 | AAME | ATLANTIC AMERIC | AAME | 524113 | Q | A | R | 5 | 5 | 3 | 6320 | 4820910 | . | . | . | 0 | -17.44246098 |
15 | 15580 | 20171229 | . | 11 | 3 | 6320 | 4820910 | AAME | ATLANTIC AMERIC | AAME | 524113 | Q | A | R | 5 | 5 | 3 | 6320 | 4820910 | . | 0 | 1 | . | . | 0 | -16.653317 |
16 | 10517 | 20101231 | 20120508 | 11 | 1 | 7359 | 253530 | AAN | AARONS INC | AAN | 443111 | N | A | R | 5674 | 7016 | 1 | 7359 | 253530 | . | . | . | 0 | 47.612254611 |
17 | 10517 | 20111230 | . | 11 | 1 | 7359 | 253530 | AAN | AARONS INC | AAN | 443111 | N | A | R | 5674 | 7016 | 1 | 7359 | 253530 | . | . | . | 0 | 31.114473349 |
18 | 10517 | 20121231 | . | 11 | 1 | 7359 | 253530 | AAN | AARONS INC | AAN | 443141 | N | A | R | 5674 | 7016 | 1 | 7359 | 253530 | . | . | . | 0 | 6.2307808656 |
19 | 10517 | 20131231 |
That one's fairly straightforward. Since your RET values are numeric:
year = int( ret/10000 );
I have my new variable year--thanks for the help!
now I am trying to merge by year:
* 3. Merge CEO data and firm data by year in a data step; DATA ceo_firm ; MERGE ceos_sorted firms_sorted; BY year; RUN;
but I am getting an error message: "variable CUSIP has been defined as both character and numeric. Isn't this odd given the fact that I was only trying to merge by year? Thanks!
Think of it this way. There's only one CUSIP variable being saved. So SAS needs to decide before bringing in any data whether CUSIP should be character or numeric. It's getting mixed messages when it examines the incoming data sets.
Once again, your decision comes before any programming begins. CUSIP needs to be defined consistently in the two data sets. Would you like it to be character or numeric? Either is possible (and easy) to accomplish.
I suppose its arbitrary really. How do I just make both numeric?
As @Cynthia_sas mentioned, you need to run PROC CONTENTS so you know which data set contains the character version. For illustration purposes, assume it's FIRMS_SORTED. Then you could run:
data firms_sorted_new;
set firms_sorted (rename=(cusip=temp));
cusip = input(temp, 12.);
drop temp;
run;
That assumes that CUSIP never exceeds 12 characters originally.
@Astounding, Good Afternoon,
I have run into some issues again in merging my datasets. One of the issues is that I want only CEOs in the merged dataset, and this community helped me find code to do that, but, even with this code, there remain other executives in my dataset. Do you know why this isn't working? Here's what my dataset and code look like:
row | exec_fullname | coname | cfoann | salary | year | date | ret annual_return |
2 | Timothy J. Romenesko | AAR CORP | 468.18 | 2010 | 20101231 | 0.023594 | 47.612254611 |
3 | Richard J. Poulton | AAR CORP | 367.2 | 2010 | 20101231 | 0.099805 | 46.882111929 |
4 | Terry D. Stinson | AAR CORP | 338.13 | 2010 | 20101231 | 0.03667 | 53.066450461 |
5 | Robert J. Regan | AAR CORP | 367.2 | 2010 | 20101231 | 0.060538 | 350.47620897 |
6 | Robert E. Switz | ADC TELECOMMUNICATIONS INC | CEO | 767.831 | 2010 | 20101231 | 0.023277 | 49.879140893 |
7 | Patrick D. O'Brien | ADC TELECOMMUNICATIONS INC | 349.936 | 2010 | 20101231 | 0.056304 | 16.220864139 |
8 | Jeffrey D. Pflaum | ADC TELECOMMUNICATIONS INC | 310.378 | 2010 | 20101231 | -0.005556 | 5.0879452755 |
9 | James G. Mathews | ADC TELECOMMUNICATIONS INC | 344.865 | 2010 | 20101231 | 0.159607 | 20.357696705 |
10 | Kimberly Hartwell | ADC TELECOMMUNICATIONS INC | 310.378 | 2010 | 20101231 | 0.105997 | 32.401213598 |
11 | Gerard J. Arpey | AMERICAN AIRLINES GROUP INC | CEO | 669.646 | 2010 | 20101231 | 0.363636 | 90.475977378 |
12 | Daniel P. Garton | AMERICAN AIRLINES GROUP INC | 530.478 | 2010 | 20101231 | 0.174079 | 60.277798513 |
13 | Gary F. Kennedy, Esq. | AMERICAN AIRLINES GROUP INC | 502.543 | 2010 | 20101029 | 0.489189 | -0.421830073 |
14 | Robert William Reding | AMERICAN AIRLINES GROUP INC | 530.479 | 2010 | 20101231 | 0.191871 | 290.00037099 |
15 | Thomas W. Horton | AMERICAN AIRLINES GROUP INC | 618.135 | 2010 | 20101231 | 0.141989 | 30.597781935 |
16 | Isabella D. Goren | AMERICAN AIRLINES GROUP INC | 423.993 | 2010 | 20101231 | -0.162892 | 10.080476093 |
libname paper "~/425/425_Final_Paper"; *import compustat dataset; PROC IMPORT out=paper.CompuStat_Execucomp datafile="~/425/425_Final_Paper/CompuStat_Execucomp.csv" DBMS=CSV replace; getnames= yes; guessingrows=2000; run; *keep only CEOs; data CEO; set paper.Compustat_ExecuComp; if CEOANN = 'CEO' then output CEO; run; *import csrp dataset; PROC IMPORT out=paper.CSRP_Monthly_Stock datafile="~/425/425_Final_Paper/CSRP_MonthlyStock.csv" DBMS=CSV replace; getnames= yes; /*guessingrows=max;*/ run; *delete rows with missing cusips; data paper.CSRP_Monthly_Stock; set paper.CSRP_Monthly_Stock; if nmiss(cusip) > 0 then delete; run; *create new variable year; data paper.CSRP_Monthly_Stock; set paper.CSRP_Monthly_Stock; year = int( date/10000 ); run; *delete rows with missing return data; data paper.CSRP_Monthly_Stock; set paper.CSRP_Monthly_Stock; if nmiss(ret) > 0 then delete; run; *sort by ticker; proc sort data=paper.CSRP_Monthly_Stock; by ticker date; run; data paper.CSRP_Monthly_Stock; set paper.CSRP_Monthly_Stock; by ticker date; retain annual_return 1; annual_return = annual_return * (1 + RET); month = int( mod(date,10000) / 100); if month = 12 or last.ticker; annual_return = (annual_return - 1) * 100; output; annual_return = 1; run; /*merging*/ /*keep only CEOs*/ data CEO; set paper.Compustat_ExecuComp; if CEOANN = 'CEO' then output CEO; run; * 1. Sort CEO data by cusip and year & save sorted file as ceo_sorted ; PROC SORT DATA=paper.Compustat_ExecuComp OUT=ceos_sorted; BY year; RUN; * 2. Sort firm data by cusip and year & save sorted file as firms_sorted ; PROC SORT DATA=paper.CSRP_Monthly_Stock OUT=firms_sorted; BY year; RUN; libname paper "~/425/425_Final_Paper"; /*keep only CEOs*/ data CEO; set paper.Compustat_ExecuComp; if CEOANN = 'CEO' then output CEO; run; * 3. Merge CEO data and firm data by year in a data step; DATA ceo_firm ; MERGE ceos_sorted_new firms_sorted; BY year; RUN; /*use proc contents to see if there is a type mismatch*/ proc contents data=paper.CSRP_Monthly_Stock; run; proc contents data=paper.Compustat_ExecuComp; run; * change cusip to numeric variable; data ceos_sorted_new; set ceos_sorted (rename=(cusip=temp)); cusip = input(temp, 12.); drop temp; run;
Thanks for the help!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.