Explanation of revised code and datasets
Instead of the previous datasets DATE1 and DATE0, we have now more versatile datasets BLOCKS and BLOCKS0.
For each ID, dataset BLOCKS contains one observation per block of consecutive observations in dataset TWO with the same RATE (be it 0 or not). Variables are ID, RATE, DATE1 (=first TGTDT in the block) and DATELAST (=last TGTDT in the block). So, it's similar to the first version of the old dataset DATE0, but, unlike DATE0, it is not restricted to blocks with RATE=0. The reason is that we need information from blocks with RATE not equal to zero.
In contrast, dataset BLOCKS0 is restricted to blocks with RATE=0 (hence its name) and has again one obs. per block. It is created by the second data step, which reads the observations from dataset BLOCKS with RATE=0 and the previous and the following obs. of each of these observations. The look-back and look-ahead techniques used are the same: second SET statement with POINT= option.
As a result, BLOCKS0 contains the ID and DATE1 ("first date") values from each block (in dataset TWO) with RATE=0 and, in addition,
the values RATE and DATELAST (see description of BLOCKS above) from the preceding block (i.e. with RATE ne 0)
the value DATE1 from the following block (again, necessarily with RATE ne 0).
I think, the names of these three additional variables are self-explanatory: PREVRATE, PREVDATELAST and NEXTDATE1. Please note that the second data step creates even more variables: PREVDATE1, NEXTDATELAST and NEXTRATE. But these three are dropped at the end, as is DATELAST from the "zero rate observation", because they are not needed in your current requirements. Obviously, we could make these pieces of information (first date of previous block, last date and rate of next block, last date of current block) available immediately, if needed in future specifications.
Of course, I've made sure that "previous" and "following" blocks are only regarded within the same ID. This means, if the very first block of an ID had RATE=0, the PREVxxxx variables would have missing values. Similarly, the NEXTxxxx variables of a "zero rate" block would have missing values, if this was the last block of the respective ID.
Only minor changes were required for the final PROC SQL step: It now retrieves data from dataset BLOCKS0 rather than DATE0. The RENAME= option used provides flexibility with regard to the names of the new variables. Three variables from BLOCKS0 are renamed to match your specifications. The fourth new variable, PREVRATE, can keep its original name. The formatting of the latter with three decimals (using format 8.3) is, of course, optional and was just suggested by your sample data.
Again, please test the solution with more sample (or real) data.
... View more