DATA Step, Macro, Functions and more

Reading off reference table dynamically

Reply
N/A
Posts: 0

Reading off reference table dynamically

HI I have two tables. wonder if someone can assist me doing the following:

Table #1
Date Rate
20100701 2.50
20090801 2.25
20090302 2.00

Table #2 (has 10,000 records)
CustomerID Lending_Date
1 20090626
2 20100708

Basically what i want to do is use the DB2 Table #1 as reference table to add new column "rate" to customer Table #2. For example
CustomerID #1 will get rate 2.00 because the lending_date is greater than 20090302 but less than 20090801

Resulting table:
CustomerID Lending_Date Rate
1 20100626 2.00
2 20100708 2.50

My challenge is that table #1 is a DB2 table that could be growing irregularly and I will not be the one update the table. How can i get it done using data step?
Thx
Valued Guide
Posts: 634

Re: Reading off reference table dynamically

Posted in reply to deleted_user
Essentially you are looking up the most recent loan rate. The DATA step below creates a UNION of the two data sets using a SET statement. Since the dates do not necessarily align we need to retain the loan rate.
[pre]
data rates;
input Date yymmdd8. Rate;
format date yymmdd10.;
datalines;
20100701 2.50
20090801 2.25
20090302 2.00
run;

data loans;
input CustomerID $ Date yymmdd8.;
datalines;
1 20090626
2 20100708
run;

proc sort data=rates;
by date;
run;

data loanrate(keep=date loanrate customerid);
set rates(in=inrates)
loans(in=inloan);
by date;
retain loanrate;
if inrates then loanrate=rate;
if inloan;
run;
[/pre]
For this solution (which requires a BY statement), you may need to do more or less sorting depending on how your incoming data sets are ordered.
Valued Guide
Posts: 2,177

Re: Reading off reference table dynamically

Posted in reply to deleted_user
If this process runs in SAS, I would apply the rate through a format look-up. That allows the data to be in an order more relevant to the processing.
Create the lookup format with : [pre]data cntlin ;
retain fmtname 'latest' start end label . hlo 'O'
sexcl 'N' eexcl 'Y' ;
* the "next date" is needed to define the rate date range but must be
excluded from the date range - so EEXCL =Y excludes end-of-range;
output ; * missing becomes missing ;
hlo = 'L' ;
label = .N ; * no rate prior to first date ;
* now output each rate for range up to following date ;
do while( not eof ) ;
set DB2.TABLE_1( rename=( date= end rate= label2 )) end= eof ;
* ensure db2_TABLE_1 is extracted as a single row for each date and in date order;
output ;
hlo = ' ' ;
* next start follows after this end ;
start = end ;
* the rate before the next date is the rate on this date ;
* so output this rate when next date becomes available ;
label = label2 ;
end ;
* now output final range which continues with no "end date" (hlo =H) ;
end = . ;
hlo= 'H' ;
output ;
stop ;
run ;
* now compile the format ;
proc format cntlin= cntlin ;
run ; * [/pre] now use that date-lookup in this kind of way ;[pre]data whatever_update ;
set table_2 ;
rate = input( put( lending_date, latest. ), best10. ) ;
run ;[/pre]
Of course, it might be better to use the newer technology available since SAS9 introduced associative arrays - otherwise knows as hash tables. In fact this would make a good example to be incorporated in the on-line doc
Regular Contributor
Posts: 241

Re: Reading off reference table dynamically

...
> Of course, it might be better to use the newer
> technology available since SAS9 introduced
> associative arrays - otherwise knows as hash
> tables. In fact this would make a good example to be
> incorporated in the on-line doc
...
I don't think hash helps since the difficulty here is to lookup a date in a table with only the starting dates of new rates. Your format approach works great because we can make a format with input *ranges*.

If we first "expand" the rate table so that the rate is given for each day (from the earliest date to today), then things become really simple. We can put this "expanded" rates table into hash and do lookups, but a simple merge will do nicely as well.

[pre]
/* test data */
data rates;
input date yymmdd8. rate;
cards;
20100701 2.50
20090801 2.25
20090302 2.00
;
run;
data loans;
input id date yymmdd8.;
cards;
1 20090626
2 20100708
;
run;

/* prep datasets. you don^t have to sort loans if you use hash to look up */
proc sort data=rates nodupkey;
by date;
run;
proc sort data=loans;
by date;
run;

/* expand rates table so that we can cover everyday
from min(date) to today. a single rate per day only. */
data daily;
format date yymmdd10.;
merge rates(rename=(date=start))
rates(firstobs=2 keep=date rename=(date=nextStart));
/* by omitted intentionally */
finish = ifn(missing(nextStart), today(), nextStart-1);
do date = start to finish;
output;
end;
keep date rate;
run;
/* attach rates to loans. simple merge. */
data loansr;
merge loans(in=inLoans) daily;
by date;
if inLoans;
run;

/* check */
proc print data=loansr noobs;
run;
/* on lst
id date rate
1 2009-06-26 2.0
2 2010-07-08 2.5
*/
[/pre]

If you have an SAS/ETS licensed, then you can take advantage of PROC EXPAND to create the dataset daily (you have to append today's date first, then use BEGINNING and METHOD=STEP options in your CONVERT statement.)

For the look-ups, simple merge is fine as above. If you insist using hash, then here you go:
[pre]
data hash;
if 0 then set daily; /* to prep pdv */
if _n_=1 then do;
dcl hash h(dataset:'dailyRates');
h.defineKey('date');
h.defineData('rate');
h.defineDone();
end;
set loans;
h.find();
run;

/* check */
proc compare base=loansr compare=hash;
run;
/* on lst -- in part
NOTE: No unequal values were found. All values compared are exactly equal.
*/
[/pre]
Ask a Question
Discussion stats
  • 3 replies
  • 162 views
  • 0 likes
  • 4 in conversation