Hello!
I am trying merge two datasets by ticker and year. When I run the code, however, SAS brings up an issue with another variable, cusip. I understand that the issue is that one dataset has this variable as a character variable while the other has it as a numeric one. That being said, shouldn't this be unrelated if I am trying to merge by ticker and year?
Here is the code:
libname paper "~/425/425_Final_Paper"; *import compustat dataset; PROC IMPORT out=paper.CompuStat_Execucomp datafile="~/425/425_Final_Paper/CompuStat_Execucomp.csv" DBMS=CSV replace; getnames= yes; guessingrows=2000; run; *keep only CEOs; data CEO; set paper.Compustat_ExecuComp; if CEOANN = 'CEO' then output CEO; run; *import csrp dataset; PROC IMPORT out=paper.CSRP_Monthly_Stock datafile="~/425/425_Final_Paper/CSRP_MonthlyStock.csv" DBMS=CSV replace; getnames= yes; /*guessingrows=max;*/ run; *make new variable annual_return; data paper.CSRP_Monthly_Stock; set paper.CSRP_Monthly_Stock; retain annual_return 1; annual_return = annual_return * (1 + RET); month = int( mod(date,10000) / 100); if month = 12 or last.ticker; annual_return = (annual_return - 1) * 100; output; annual_return = 1; run; * 1. Sort CEO data by ticker and year & save sorted file as ceo_sorted ; PROC SORT DATA=work.CEO OUT=ceos_sorted; BY ticker year; RUN; * 2. Sort firm data by ticker and year & save sorted file as firms_sorted ; PROC SORT DATA=paper.CSRP_Monthly_Stock OUT=firms_sorted; BY ticker year; RUN; * 3. Merge CEO data and firm data by ticker and year in a data step; DATA ceo_firm; MERGE ceos_sorted firms_sorted; BY ticker year; RUN;
And here is the log:
@sastuck wrote:
Hello!
I am trying merge two datasets by ticker and year. When I run the code, however, SAS brings up an issue with another variable, cusip. I understand that the issue is that one dataset has this variable as a character variable while the other has it as a numeric one. That being said, shouldn't this be unrelated if I am trying to merge by ticker and year?
Yes it is critical a data set variable may only have one type. Since you have two different types SAS does not know which data type you actually want.
I would guess the "cusip" is something like "customer ip" or similar. If the value is an identifier and is not going to have arithmetic performed with it then likely it should be character.
You can over come issues like this by: 1) Best is controlling you data at import/creation (in this case note that you used guessing rows in on proc import and not the other which is a likely cause in this case) such as actually reading data when practical using a documented description of the data file(s) in question.
2) lots of "fixit" code like; Assumes the firms_sorted is the set with the numeric version of the cuspid and a character version is desired.
DATA ceo_firm; MERGE ceos_sorted firms_sorted (rename=(cuspid=cuspnum) ) ; BY ticker year; /* to keep the version from the firms_sorted set the way MERGE
would work
the 12 in the next line should match the lenght of the character version of cuspid */ if not missing(cuspnum) then cuspid = put(cuspnum,best12. -L); drop cuspnum; RUN;
Or if you don't need the cuspid for any further analysis:
DATA ceo_firm; MERGE ceos_sorted (drop=cuspid ) firms_sorted (drop=cuspid ) ; BY ticker year; RUN;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.