12-15-2017 03:18 AM
I am trying to merge a dataset which includes CRSP + IBES data with another IPO data set. The first datasets includes e.g. permno, ncusip, date, stock returns. However it has 50million rows since it includes daily stock returns. The other dataset includes only 7000 IPOs including PERMNO, NCUSIP, dates etc). I would like to merge these two data sets so that in the end I have a set where there is only those original 7000 IPOs + correct stock return from the first day of the IPO. What kind of code should I use to merge these two sets by the date and some identifier (e.g. permno)?
12-15-2017 03:35 AM - edited 12-15-2017 03:35 AM
Please supply example data. Use the macro from https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to convert subsets of data to datasteps, and post them here according to https://communities.sas.com/t5/help/faqpage/faq-category-id/posting?nobounce. Also supply an example of the expected result. If you already have code that you tried, include that also.
12-15-2017 04:53 AM
The macro is included in the download as a .sas file. Store that anywhere on your PC, and either open it with EG or the enhanced editor of Base SAS, or with a text editor and copy/paste the text to SAS Studio. Once you run the code, the macro is ready for use. The macro code itself contains a description of it's use in a section that is also written to the log when the macro is used with !HELP as a parameter.
If you want to use the macro repeatedly in the future, you can include it in your autocall library.
12-15-2017 10:03 AM
Without data set names or variables it is very hard to be very specific. What it sounds like is you might be looking for some sort of Proc SQL left join
here is a skeleton
proc sql; create table want as select a.permno, a.ncusip, a.dateofselection, <repeat for other variables you want from the set aliased as a>, b.var1, b.var2, <repeat for other variables wanted> from smallerdataset as a left join otherdataset as b on a.permno=b.permno and a.ncusip=b.ncusip and a.dateofselection=b.dateofselection <repeat for other matching criteria> ; quit;
you would want enough variables in the ON matches to reduce your matches from the larger dataset to a single record. If the variables have different names but the same values then use the matching variables. Note that the above comparisons assume that the case and spelling is the same (IBM will not match I.B.M. or GENERAL ELECTRIC will not match General Electric).
12-15-2017 01:32 PM
Since you've merged CRSP and IBES data, I bet your dataset (call it CRSP_IBES) is already sorted by permno/date. If so, this is almost surely the fastest way to extract CRSP_IBES data for just the permno/dates of interest (from dataset we'll call IPODATA):
proc sort data=ipodata out=temp; by permno date; run; data want; merge crsp_ibes temp (in=int); by permno date; if int; run;
Question: Since you are using public trade-exchange data as reported by CRSP, presumably the earliest record in CRSP_IBES for each PERMNO is an IPO date (or it's a date for new permnos from companies merging, splitting, or issuing new share classes). So I would assume that the permno/dates in IPO_DATA is a proper subset of the collection of earliest records for each permno in CRSP_IBES.
If for any permno, you find that is not the case, (i.e. crsp_ibes has dates preceding the ipo date), you should investigate those permno's. This assumes, of course, that the date in IPO_DATA is the IPO date related to the company issuing the stock identified by PERMNO.