I have a dataset that has a RATE variable and a DATE variable. I want to create a table that lists each unique RATE with the most recent DATE. For example, if I had 100 records that had various counts of 1%, 2%, 3%, 4%, 5% rates, the table that I would ultimately end up would be 5 records, 1 of each for each rate, with the corresponding date that was most recent. I've been struggling with an approach to code this, so any advice that will get my in the right direction would be appreciated.
I've thought about sorting the data by the RATE column and then DATE column (descending), getting the count of all the records with each rate, and then creating datasets for each rate using the OBS function (ie if the first rate occurred 62 times, I would create a dataset starting with OBS1 and then the next dataset with LASTOBS + 1), but there must be an easier way I haven't thought of.
I used the code Peter suggested and have run into a new problem. The first dataset I examined has 18 unique rate/date combos. All 18 of these combos have the same rate (0%) and obviously a different date. When I run the code Peter suggested, the 3,016 observations output 18 observations, representing each date with a rate of 0%. I want the result in this case to be in observation - 0% with the most recent date.
A little insight into what I need out of this analysis might help as well. The point of this analysis is to identify within each dataset if a business line is using multiple rates (which is prohibited). For reporting purposes, we want to identify all rates used and the most recent date at which they were used (business lines are allowed to change rates, but only at certain times). In most cases, we expect to see only one rate, so the resulting table for each business line should be only one observation.
sounds like you will have to sort your data (previously it was "Doc" who offered code).[pre]proc sort data= your.data out= lines ;
by business_line date ;
data exceptions( keep= business_line date rate ) ;
do until( last.business_line ) ;
set lines ;
by business_line rate notsorted ;
if last.rate and not last.business_line then do ;
problem_date= date ;
if problem_date then output ;
run ;[/pre]with some sample data I would be able to do appropriate testing.
Notice that I do not sort by RATE. If a business line uses rate1 then rate2 then rate1 again, the process I propose will have two rows in exceptions- first just before rate2 is introduced, and second with the last date when rate1 was used. I decided that model was appropriate because you seem very interested in the date of change, and not just "the last date used".
I would turn the rate table into a format based on returning the rate for dates in the range.
If the date range is not more than a few years fill an array indexed on date with rate for each date. 10 years would need no more than 10 x 366 x 8 bytes. That is small amount of memory.
The advantage of these approaches - no need to sort the data.