Hello!
I am trying merge two datasets by ticker and year. When I run the code, however, SAS brings up an issue with another variable, cusip. I understand that the issue is that one dataset has this variable as a character variable while the other has it as a numeric one. That being said, shouldn't this be unrelated if I am trying to merge by ticker and year?
Here is the code:
libname paper "~/425/425_Final_Paper"; *import compustat dataset; PROC IMPORT out=paper.CompuStat_Execucomp datafile="~/425/425_Final_Paper/CompuStat_Execucomp.csv" DBMS=CSV replace; getnames= yes; guessingrows=2000; run; *keep only CEOs; data CEO; set paper.Compustat_ExecuComp; if CEOANN = 'CEO' then output CEO; run; *import csrp dataset; PROC IMPORT out=paper.CSRP_Monthly_Stock datafile="~/425/425_Final_Paper/CSRP_MonthlyStock.csv" DBMS=CSV replace; getnames= yes; /*guessingrows=max;*/ run; *make new variable annual_return; data paper.CSRP_Monthly_Stock; set paper.CSRP_Monthly_Stock; retain annual_return 1; annual_return = annual_return * (1 + RET); month = int( mod(date,10000) / 100); if month = 12 or last.ticker; annual_return = (annual_return - 1) * 100; output; annual_return = 1; run; * 1. Sort CEO data by ticker and year & save sorted file as ceo_sorted ; PROC SORT DATA=work.CEO OUT=ceos_sorted; BY ticker year; RUN; * 2. Sort firm data by ticker and year & save sorted file as firms_sorted ; PROC SORT DATA=paper.CSRP_Monthly_Stock OUT=firms_sorted; BY ticker year; RUN; * 3. Merge CEO data and firm data by ticker and year in a data step; DATA ceo_firm; MERGE ceos_sorted firms_sorted; BY ticker year; RUN;
And here is the log:
@sastuck wrote:
Hello!
I am trying merge two datasets by ticker and year. When I run the code, however, SAS brings up an issue with another variable, cusip. I understand that the issue is that one dataset has this variable as a character variable while the other has it as a numeric one. That being said, shouldn't this be unrelated if I am trying to merge by ticker and year?
Yes it is critical a data set variable may only have one type. Since you have two different types SAS does not know which data type you actually want.
I would guess the "cusip" is something like "customer ip" or similar. If the value is an identifier and is not going to have arithmetic performed with it then likely it should be character.
You can over come issues like this by: 1) Best is controlling you data at import/creation (in this case note that you used guessing rows in on proc import and not the other which is a likely cause in this case) such as actually reading data when practical using a documented description of the data file(s) in question.
2) lots of "fixit" code like; Assumes the firms_sorted is the set with the numeric version of the cuspid and a character version is desired.
DATA ceo_firm; MERGE ceos_sorted firms_sorted (rename=(cuspid=cuspnum) ) ; BY ticker year; /* to keep the version from the firms_sorted set the way MERGE
would work
the 12 in the next line should match the lenght of the character version of cuspid */ if not missing(cuspnum) then cuspid = put(cuspnum,best12. -L); drop cuspnum; RUN;
Or if you don't need the cuspid for any further analysis:
DATA ceo_firm; MERGE ceos_sorted (drop=cuspid ) firms_sorted (drop=cuspid ) ; BY ticker year; RUN;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.