My understanding is that
You want one output record for each M&A record (dataset M_and_A in the code below).
For each M&A record you want an earnings record posted on the same date, or the earliest following date (from dataset EARNINGS).
If both of your datasets are sorted by ID (e.g. ticker, cusip, permco), and DATE, then you can take advantage of conditional SET statements (if some_condition then SET some_dataset), to carry forward (or in this case, carry backward) data of interest.
You might think you have to sort data in reverse chronological order within each id, then use conditional set's to logically carry forward (but in real calendar terms, carry backward) the needed matching data. Having done that, you'd have to re-sort back to chronological order.
You don't have to do that.
Instead, just recognize that carrying back a given earnings record dated, say 02JUL2023, back to (but not including) the previous earnings record (say 31mar2023), is the same as carrying forward the 02JUL2023 data from 01APR2023 (day after preceding earnings) forward through 02JUL2023. An inexpensive (compared to sorting and re-sorting) dataset view can enable this approach. In other words, carrying data from date D{t} backwards is the same as carrying forward the same data from date D{t-1}+1, so just create a dataset view using this concept.
Something like this:
data earnings_next/ view=earnings_next;
set earnings;
by id;
_sort_date=ifn(first.id,'01jan1960'd,sum(lag(date),1));
run;
data want
(label="Every M&A record plus either the sameday earnings or earliest following Earnings"
drop=_:);
merge m_and_a (in=in_m_and_a rename=(date=_sort_date) keep=id _sort_date)
earnings_next (in=in_earnings keep=id _sort_date);
by id _sort_date;
if in_earnings then set earnings_next (rename=(date=earnings_date));
if inm_and_a then set m_and_a (rename=(date=M_and_A_date)) ;
if inm_and_a;
run;
The first MERGE is unconditional, which means all the variables it retrieves from the merged datasets are replaced (or set to missing) with every new distinct value of ID/_SORTDATE. I.E. all earnings data are set to missing every time an M&A record is read (without an exact earnings match), and vice-versa. But this unconditional MERGE reads only the sorting variables ID and _SORT_DATE, to avoid overwriting data from the subsequent conditional SETs.
The conditional set statements later in the code means that the relevant data is NOT replaced or set to missing until the condition is met. So earnings data records are logically carried forward (but actually carried backward using the modified _SORT_DATE in earnings_next) and held. It is still there when a subsequent M&A record is encountered, (inm_and_a=1), M&A data is read, and the merged data is output.
I have a 2017 paper on this: History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies
A couple notes on the code:
In making the earnings_next view, I use 01jan1960 as the default _SORT_DATE value for the first record of each ID. You can use whatever date value you want (including a missing value), as long as it precedes the earliest actual DATE value in both datasets.
The resulting data will have one merged record for each M_and_A record, with two DATE variables M_and_A_date and earnings_date.
... View more