BookmarkSubscribeRSS Feed
Voutsu
Calcite | Level 5

Hi!

 

I am trying to merge a dataset which includes CRSP + IBES data with another IPO data set. The first datasets includes e.g. permno, ncusip, date, stock returns. However it has 50million rows since it includes daily stock returns. The other dataset includes only 7000 IPOs including PERMNO, NCUSIP, dates etc). I would like to merge these two data sets so that in the end I have a set where there is only those original 7000 IPOs + correct stock return from the first day of the IPO. What kind of code should I use to merge these two sets by the date and some identifier (e.g. permno)?

 

BR,

Voutsu

5 REPLIES 5
Voutsu
Calcite | Level 5

Alright, thanks!

 

I am not using University Edition so is there also instructions for 9.4 version?

Kurt_Bremser
Super User

The macro is included in the download as a .sas file. Store that anywhere on your PC, and either open it with EG or the enhanced editor of Base SAS, or with a text editor and copy/paste the text to SAS Studio. Once you run the code, the macro is ready for use. The macro code itself contains a description of it's use in a section that is also written to the log when the macro is used with !HELP as a parameter.

If you want to use the macro repeatedly in the future, you can include it in your autocall library.

ballardw
Super User

Without data set names or variables it is very hard to be very specific. What it sounds like is you might be looking for some sort of Proc SQL left join

here is a skeleton

proc sql;
   create table want as
   select a.permno, a.ncusip, a.dateofselection, <repeat for other variables you want 
      from the set aliased as a>, b.var1, b.var2, <repeat for other variables wanted>
   from smallerdataset as a
        left join
        otherdataset  as b
        on a.permno=b.permno
        and a.ncusip=b.ncusip
        and a.dateofselection=b.dateofselection
        <repeat for other matching criteria>
   ;
quit;

you would want enough variables in the ON matches to reduce your matches from the larger dataset to a single record. If the variables have different names but the same values then use the matching variables. Note that the above comparisons assume that the case and spelling is the same (IBM will not match I.B.M. or GENERAL ELECTRIC will not match General Electric).

 

mkeintz
PROC Star

Since you've merged CRSP and IBES data, I bet your dataset (call it CRSP_IBES) is already sorted by permno/date.  If so, this is almost surely the fastest way to extract CRSP_IBES data for just the permno/dates of interest (from dataset we'll call IPODATA):

 

proc sort data=ipodata out=temp;
  by permno date;
run;

data want;
  merge crsp_ibes  temp (in=int);
  by permno date;
  if int;
run;

 

Question:  Since you are using public trade-exchange data as reported by CRSP, presumably the earliest record in CRSP_IBES for each PERMNO is an IPO date (or it's a date for new permnos from companies merging, splitting, or issuing new share classes).  So I would assume that the permno/dates in IPO_DATA is a proper subset of the collection of earliest records for each permno in CRSP_IBES.

 

If for any permno, you find that is not the case, (i.e. crsp_ibes has dates preceding the ipo date), you should investigate those permno's.  This assumes, of course, that the date in IPO_DATA is the IPO date related to the company issuing the stock identified by PERMNO.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 647 views
  • 0 likes
  • 4 in conversation