BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi Guys,

I am presently working with a matrix table containing a lo-date a lo-amount and an interest rate.  There are 4,000,000 observations within this table.  I need to perform a look up on the matrix table based on the effective date of the transaction and the current balance in my transaction data.  The original code uses SQL to perform this look up and although it works it takes over 2 hours to complete.  Can someone offer some insight into a more effecient method obtaining the desired outcome?  I have added indexes and we are seeing a small amount of improvement, but no where near what is required.

The logic is if Lo_Date <= Effective_Date and Lo_Amount <= Balance then Int_Rate

Thank you very much for your help.

Matrix_Table
Lo_DateLo_AmountInt_Rate
1/01/201400%
1/01/201410005%
1/01/20141000010%
1/01/201410000015%
1/02/201405%
1/02/2014100010%
1/02/20141000015%
1/02/201410000020%

Transaction_Table
Effective_DateBalance
15/01/201411000
15/01/20141000
15/01/201415000
15/01/2014110000
15/01/2014900
15/01/2014100
15/01/20146000
15/01/2014100001
15/01/201499999
15/02/201411000
15/02/20141000
15/02/201415000
15/02/2014110000
15/02/2014900
15/02/2014100
15/02/20146000
15/02/2014100001
15/02/201499999

Result_Table
Effective_DateBalanceInt_Rate
15/01/20141100010%
15/01/201410005%
15/01/20141500010%
15/01/201411000015%
15/01/20149000%
15/01/20141000%
15/01/201460005%
15/01/201410000115%
15/01/20149999910%
15/02/20141100015%
15/02/2014100010%
15/02/20141500015%
15/02/201411000020%
15/02/20149005%
15/02/20141005%
15/02/2014600010%
15/02/201410000120%
15/02/20149999915%
1 ACCEPTED SOLUTION

Accepted Solutions
jakarman
Barite | Level 11

What I did is adding  error conditions.
- not going byond the current date in the inner loops 
- keeping the previous conditions  (reversed logic) as condition.

- only assiging int_rate when still all conditions are met
- recovery when something goes beyond scope an error (restart from scratch)

- error dataset for missed condtions.

As my session UE can behave badly going beyond limitations saved those in a permanent type. 

The whole idea is based on keeping searching just within that current date.
Knowing the power of factors and the ! function.

Your testdata is still running in the order off seconds on my machine. What is the result on your side?.

libname test "/folders/myfolders/test";

DATA test.MATRIX;
FORMAT LO_DATE DATE9.;
INFILE DATALINES;
INPUT LO_AMOUNT;
DO MATRIXTABLE = 81,85,100;
DO LO_DATE = "01JAN1980"D TO "05SEP2014"D;
  DO LO_FREQ = 1,12,40,60;
   INTRATE = RANUNI(0);
   OUTPUT;
  END;
END;
END;
DATALINES;
0
1000
5000
10000
100000
1000000
;
RUN;


DATA test.HISTORYTRANS (drop=i) ;
ARRAY VARS $20 VAR1-VAR40;
FORMAT EFFECTIVEDATE DATE9.;
DO TABLE = 81,85,100;
DO EFFECTIVEDATE = "01JAN1980"D TO "05SEP2014"D BY CEIL(RANUNI(0)*100);
  DO FREQ = 1 TO 60 BY CEIL(RANUNI(0)*60);
   DO AMOUNT = 100 TO 1000000 BY CEIL(RANUNI(0)*1000000);
    DO I = 1 TO 40;
     VARS{I} = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
    END;
    IF RANUNI(0) < .20 THEN OUTPUT;
   END;
  END;
END;
END;
RUN;

/*  -- */
proc sort data=test.matrix        ;by descending matrixtable descending lo_Date descending lo_freq descending lo_amount  ; run;

proc sort data=test.HISTORYTRANS  ;by descending table descending EffectiveDate descending freq descending amount  ; run;

DATA test.Lookup test.lookup_err;
retain RC ;
SET test.HISTORYTRANS ;

IF _N_ = 1 THEN DO;
  DECLARE HASH H(DATASET:"test.MATRIX",ORDERED:"D", HASHEXP:16);
  DECLARE HITER HI("H");
  H.DEFINEKEY("MATRIXTABLE","LO_DATE","LO_FREQ","LO_AMOUNT");
  H.DEFINEDATA("MATRIXTABLE","LO_DATE","LO_FREQ","LO_AMOUNT","INTRATE");
  H.DEFINEDONE();
  Call missing(MATRIXTABLE,LO_DATE,LO_FREQ,LO_AMOUNT,INTRATE) ;
  /* add addtional stops for up/down without error getting there  */
  Matrixtable=-1  ;lo_date='01jan1800'd; Lo_Freq=0; lo_amount=-1; RC = H.ADD();
  Matrixtable=9999;lo_date='01jan2200'd; Lo_Freq=0; lo_amount=-1; RC = H.ADD();
END;
format LO_date date. ;

IF (_N_ = 1 | RC not = 0  ) THEN RC = HI.FIRST();
DO WHILE (RC = 0 AND INT_RATE = .);
   IF MATRIXTABLE = TABLE & LO_DATE = EFFECTIVEDATE    THEN DO;
     Do While (  RC=0 & LO_FREQ > FREQ  & LO_DATE = EFFECTIVEDATE ) ;
       RC = HI.NEXT();
     end ;
     Do While (  RC=0  & LO_AMOUNT > AMOUNT  & LO_DATE = EFFECTIVEDATE  & LO_FREQ <=  Freq) ;
       RC = HI.NEXT();
     end ;
     if ( RC = 0 & MATRIXTABLE = TABLE & LO_DATE = EFFECTIVEDATE & LO_FREQ <=  Freq & LO_AMOUNT <= AMOUNT ) then INT_RATE = INTRATE; 
  END;  
  Else Do;
    RC = HI.NEXT();
  END;  
END;
If (LO_Date = Effectivedate) then OUTPUT test.lookup ;
else output test.lookup_err;

IF ( RC not = 0 | LO_date < '01feb1800'd  ) THEN Do;
  put "reset search: " _N_ RC lo_date ; 
  RC = HI.FIRST();
end;
else DO WHILE (LO_DATE <= EFFECTIVEDATE & RC = 0 );
   RC = HI.PREV();  /* get back to starting point of date - previous date */
end;

run;

---->-- ja karman --<-----

View solution in original post

20 REPLIES 20
jakarman
Barite | Level 11

There are several techniques to do a table lookup: http://www2.sas.com/proceedings/forum2008/095-2008.pdf

The problem is: you have two values date/value that is giving one new value.
This limits to some techniques. All is depending how big the dataset 4M records is compared to your system When this is those 3 variables 2 numbers and a string of 4 (?) it is still below 100MB. Not shocking big should  fit into memory. A Hash technique would be the most efficient one. 
Something to work out

reviews question:
- the recoding in percentages it several classes doe only makes sense by a table when they are getting different in time
  When there are several recode list of the values that are different in time and that list is limted by number of 10 or 20 or automated bij let us say 1000.

  It is making more sense to define those list as a format and select each of those formats as a time dependent variable.     

With hashing it could go like:

data result ;

  if _n_=1 then  do;

    declare hash h(dataset:'matrix', multidata:'y');

    h.definekey('lo_date');

    h.definedata(ALL:'yes');

    h.definedone();

   end;

   set transaction;

   attrib int_rate format=percent. informat=percent. length=4 ;

   call missing(lo_amount,int_rate);

   lo_date=effective_date;

   do rc=h.find() by 0 while (rc=0 & lo_amount <= balance);

      rc=h.find_next();

   end;

   if (rc = 0 ) then rc=h.find_prev() ;

   /* put _all_ ; */

run;

As it are percentages (formatted) note that the precision is not decimal exact.

---->-- ja karman --<-----
Scott_Mitchell
Quartz | Level 8

Hi Jaap.

Unfortunately I am constrained using 9.1.3, hence no multi data option.

Is there a work around for this?

jakarman
Barite | Level 11

The multidata is new at 9.2.

9.1.3 is already very old (2008), grrr why is lcm of SAS that bad as there is no license cost constraint. Yes it is a personal frustration.

I think that drops the hash approach. We are still having:

- a format usage I am thinking on:

    The date delivering a format name,,, the delivered format apply on the balance value

    How many recodes (formats) do you have/expect   (asked this before)
    This is quite a unique but simple approach inputn usage. SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

The code for this goes like:

/* ----------------------- */
proc format;
value recode_dt (default=16 max=32)
'01jan2014'd - '01feb2014'd = "rcd_1v."
'01feb2014'd - '01mar2014'd = "rcd_2v."
  other= "dummy" ;
invalue dummy
  other=. ; /* undefined recode */  
invalue rcd_1v
  0-<1000 = 0.00  
  1000-<10000 = 0.05
  10000-<100000 = 0.10
  100000-High   = 0.15
  other=. ; /* undefined recode */
invalue rcd_2v
  0-<1000 = 0.05  
  1000-<10000 = 0.1
  10000-<100000 = 0.15
  100000-High   = 0.20
  other=. ; /* undefined recode */
run;

data result ;
   set transaction;
   attrib int_rate format=percent. length=4   int_recod length=$8 ;
   int_recod=put(effective_date,recode_dt.);
   int_rate=inputn(put(balance,12.), int_recod) ;
run;


- a balanced merge on the data using a datastep.  
  The assumption is sorted sas datasets

Need a sample ??  (there are more ways merge by, point usage)
SASfile usage to get the dataset into memory

SQL is probably slow as it does a full join and than selecting what is needed. your 100mb is blown up first. (Cartesian) 


---->-- ja karman --<-----
Ksharp
Super User

HOHO, you must be kidding me . and let me know the feedback.

data Matrix_Table ;
input Lo_Date : ddmmyy10.     Lo_Amount     Int_Rate : percent8.;
format Lo_Date ddmmyy10.;
cards;
1/01/2014     0     0%
1/01/2014     1000     5%
1/01/2014     10000     10%
1/01/2014     100000     15%
1/02/2014     0     5%
1/02/2014     1000     10%
1/02/2014     10000     15%
1/02/2014     100000     20%
;
run;
data Transaction_Table ;
input Effective_Date : ddmmyy10.     Balance     ;
format Effective_Date ddmmyy10.;
cards;
15/01/2014     11000
15/01/2014     1000
15/01/2014     15000
15/01/2014     110000
15/01/2014     900
15/01/2014     100
15/01/2014     6000
15/01/2014     100001
15/01/2014     99999
15/02/2014     11000
15/02/2014     1000
15/02/2014     15000
15/02/2014     110000
15/02/2014     900
15/02/2014     100
15/02/2014     6000
15/02/2014     100001
15/02/2014     99999
;
run;

data temp;
 set Matrix_table(rename=(Lo_Date=Effective_Date Lo_Amount=Balance) in=ina) Transaction_Table(in=inb);
 by Effective_Date  ;
 in2=inb;
 if ina=1 and ina ne lag(ina) then group+1;
run;
proc sort data=temp;by group balance; run;
data want(drop=in2 group Int_Rate);
 set temp;
 by group;
 retain rate;
 if first.group then call missing(rate);
 if not missing(Int_Rate) then rate=Int_Rate;
 if in2;
run;

Xia Keshan

Message was edited by: xia keshan  Sorry, Fixed a problem.

jakarman
Barite | Level 11

Xia, no not kidding you. By the way I assumed the first dataset is missing those 5s for the same dates having a different recoding classification of the balance variable.

The line balance method is an optimized solution in 3g languages. You did a similar one but needed an additional resort for getting the wanted record. Thx

---->-- ja karman --<-----
Ksharp
Super User

Hi Jaap,

I can't understand you . What is 3g language and why I need an additional resort for getting the wanted record ?

I think I get the output OP desired .Don't you think so ?

Xia Keshan

Scott_Mitchell
Quartz | Level 8

Thank you all for your help and insights.

In my initial question I simplified the issue considerably in an effort to engage more people.  In doing so I may have over simplified and caused greater confusion.

The bigger story is as follows:

Basically I have 4 variables by which I need to merge.  1 variable (table) must equal the value in the in the matrix table and the other 3 should be greater than or equal to the value in the matrix.  The below produces a simulated version of the datasets I am working with.

DATA MATRIX;
FORMAT LO_DATE DATE9.;
INFILE DATALINES;
INPUT LO_AMOUNT;
DO MATRIXTABLE = 81,85,100;
DO LO_DATE = "01JAN1980"D TO "05SEP2014"D;
  DO LO_FREQ = 1,12,40,60;
   INTRATE = RANUNI(0);
   OUTPUT;
  END;
END;
END;
DATALINES;
0
1000
5000
;
RUN;


DATA HISTORYTRANS;
ARRAY VARS $20 VAR1-VAR40;
FORMAT EFFECTIVEDATE DATE9.;
DO TABLE = 81,85,100;
DO EFFECTIVEDATE = "01JAN1980"D TO "05SEP2014"D BY CEIL(RANUNI(0)*100);
  DO FREQ = 1 TO 60 BY CEIL(RANUNI(0)*60);
   DO AMOUNT = 100 TO 1000000 BY CEIL(RANUNI(0)*1000000);
    DO I = 1 TO 40;
     VARS{I} = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
    END;
    IF RANUNI(0) < .20 THEN OUTPUT;
   END;
  END;
END;
END;
RUN;

In the methods suggested thus far I am concerned about the number of times I will have to access each table to either group and sort, or create mountains of formats.  As a result I ran with a HASH ITERATOR.

DATA LOOKUP;
IF 0 THEN SET MATRIX;
IF _N_ = 1 THEN DO;
DECLARE HASH H(DATASET:"MATRIX",ORDERED:"D", HASHEXP:16);
DECLARE HITER HI("H");
H.DEFINEKEY("MATRIXTABLE","LO_DATE","LO_FREQ","LO_AMOUNT");
H.DEFINEDATA("MATRIXTABLE","LO_DATE","LO_FREQ","LO_AMOUNT","INTRATE");
H.DEFINEDONE();
END;

SET HISTORYTRANS;

RC = HI.FIRST();

DO WHILE (RC = 0 AND INT_RATE = .);
     IF MATRIXTABLE = TABLE THEN    
           IF LO_DATE <= EFFECTIVEDATE THEN
                IF LO_FREQ <= FREQ THEN
                     IF LO_AMOUNT <= AMOUNT THEN DO;
       INT_RATE = INTRATE;
       LEAVE;
      END;
     RC = HI.NEXT();
END;

RUN;

I have seen an improvement in my processing time, down from 1.5Hrs down to 1Hr (ensuring that the 2 datasets remained static at 456012 records - Matrix and 30000 records - HistoryTrans

), but I would like to improve this further.  Will the previous recommendations (@JAAP and @KSHARP ) work under the above conditions?

Could I please seek your further assistance?

Regards,

Scott

gergely_batho
SAS Employee

Hi,

below is a combination of ideas.

I am storing a FREQxAMOUNT matrix in a hash object, and updating it if new information is available. (Note: this technique works only, if your LO_FREQ and LO_AMOUNT values are always the same for every date, or if this info is missing for a specific date. In your input data you shoul alway have an INTRATE for a transaction on the same day or before! If this is not true, some additional coding is needed.)

I think it would also work without a hash table, but I wanted to keep the idea of the hash table, since in that case the transactions should be sorted only by TABLE and EFFECTIVEDATE (not by  LO_FREQ and LO_AMOUNT). Even the TABLE variable could be included in the hash - this would enable pure "stream processing" (no need for sortinig), provided your data originally is sorted by date. (Often transactional data comes already sorted.)

proc sort data=MATRIX;

by MATRIXTABLE LO_DATE;

run;

DATA LOOKUP();

IF _N_ = 1 THEN DO;

DECLARE HASH H(ORDERED:"D",HASHEXP:16);

H.DEFINEKEY("LO_FREQ","LO_AMOUNT");

H.DEFINEDATA("LO_FREQ","LO_AMOUNT","INTRATE");

H.DEFINEDONE();

DECLARE HITER HI("H");

END;

SET MATRIX(in=inM rename=(MATRIXTABLE=TABLE LO_DATE=EFFECTIVEDATE)) HISTORYTRANS();

by TABLE EFFECTIVEDATE /*FREQ AMOUNT*/;

if inM then do;/*record from MATRIX*/

  rc=h.add();

  if rc then h.replace();

end;

else do;/*record from HISTORYTRANS*/

  RC = HI.FIRST();

  DO WHILE (RC = 0);

  IF LO_FREQ <= FREQ and LO_AMOUNT <= AMOUNT THEN DO;

    output;

        LEAVE;

  END;

  RC = HI.NEXT();

  END;

end;

RUN;

jakarman
Barite | Level 11

Xia we are usually just seeing a small part of the OP's problem.
He has got a question to solve dis some steps he is knowing assuming they are the way to go. At the moment he gets blocked there is just some small part of the original question told. I try to get to that original question. In this case it looks to be there are several translations done by time. The first step The OP did was creating that matrix table. Of course I can be wrong, it is guess.  That is why I got on those format recoding way.
As example for his processing I am seeing he tried to used two dates as reference. Within the first one he missed the '5 to join those date's as exact match.
   

3G languages that are Cobol Fortran C Java  and a lot more. The difference with SAS is that auto-increment or record approach and much more like that.  Fourth-generation programming language - Wikipedia, the free encyclopedia. With a 3GL there is much more need to be aware on how to do IO processing. 
The Balance Line Algorithm is a standard known approach for fast efficient processing.  Asking for optimizing data processing problems that fit those conditions I would go for that kind.  The data set merge by (or set by) approach with SAS is one that is very alike balance line algorithm.

Did you help OP possible, we do not really know how is proceeding. When the turnaround time for him is acceptable and the coding is understandable maintainable, I guess that are the accept criteria, he should be happy.   

 

---->-- ja karman --<-----
Ksharp
Super User

WOW, Jaap.

You are impressing me . Learn a lot from you.About DIS , I truly have no idea about it since you never touch it before .As I told you before , now I am focusing on statistical theory .

Best

Xia Keshan

stat_sas
Ammonite | Level 13

Hi,

Try this, may be helpful in reducing execution time using sql.

proc sql;

select Effective_Date,balance,Int_Rate from Transaction_Table, Matrix_Table

where month(Effective_Date) = month(lo_date)

and lo_amount<=balance

group by Effective_Date,balance

having Int_Rate=max(Int_Rate);

quit;

jakarman
Barite | Level 11

Scott,

Your question is getting into the same direction as this one. https://communities.sas.com/thread/60762. It is more about performance/tuning and seeking the best algorithm for that. The hash object is working better as SQL but not a very dramatic one. The classic line-balance (sorted data) is far better. Xia already posted one based on a merge by approach.  This one with a result at JDmarino-s data from 5hr to 1 minute was sufficient to stop improvements.   

Will try with your sample data.
Just to be complete, any description of underlying hardware (no processors memory io-speed). SAS version?
Options as memsize bufsize aligniofiles?

(updated)

What I am seeing is:
- I was hoping on some standard tables in matrix. With this setting there are 100.000 of them. Could work (one catalog) not my first choice.

- creating that datasets wil last for 0.23 and (matrix 456012 obs) en 0.06 second(historytrans 15486)  IO is no issue as memory to get the matrix in is no issue.

  Sorting those datasets is rather quick. As you are checking on the highest amount vale I did a descending order for amount.
  The wanted checking for freq is not really clear for me, could be a wrong one.

- On my computer the hashing is not getting to work (resources UE). 
- Analyzing your testdata, matrix is not sorted on lo_amount.  

The balance line algorithm can run with a point=  usage that is working similar to the hash next.

Added is a reset to the starting date after each found record. It is not to the beginning of the dataset.

Id did not do something on missings. Could be needed.
This one runs in 3 minutes


proc sort data=matrix        ;by matrixtable lo_Date lo_freq descending lo_amount  ; run;
proc sort data=HISTORYTRANS  ;by table EffectiveDate freq descending amount  ; run;

/*  -- */
%let dsid = %sysfunc(open(matrix));
%let mat_nobs =%sysfunc(attrn(&dsid,NOBS));
%let rc = %sysfunc(close(&dsid));
%put matrix has &mat_nobs ;

data lookup ;
  set historytrans nobs=hst_nobs ;
  Retain eventlk 1  ;
  check=1 ;

  do while (eventlk < &mat_nobs & check ) ; 
    set matrix (rename=(intrate=int_rate)) point=eventlk  end=endprc ; 
    IF MATRIXTABLE = TABLE & LO_DATE = EFFECTIVEDATE & LO_FREQ > FREQ & LO_AMOUNT < AMOUNT THEN DO;
          check=0;
         END;
    Else  eventlk=eventlk+1;
  end;      
  output;
  do while ( LO_DATE >= EFFECTIVEDATE & eventlK > 1);
     eventlk=eventlk-1;
     set matrix (rename=(intrate=int_rate)) point=eventlk  end=endprc ; 
  end;
run; 

---->-- ja karman --<-----
jakarman
Barite | Level 11

I think this one is a far way.

Using the balance line in reversed sorted order (descending)

Going back to previous starting point as of a date marker
High/low values as markers to prevent overruns in values.

The proc sort for matrix is not relevant other than seeing it. The ordering of the hash itself will work

When the logic is all right the run time of it is amazing fast.

proc sort data=matrix        ;by descending matrixtable descending lo_Date descending lo_freq descending lo_amount  ; run;

proc sort data=HISTORYTRANS  ;by descending table descending EffectiveDate descending freq descending amount  ; run;

DATA LOOKUP2;
retain RC ;
SET HISTORYTRANS ;

IF _N_ = 1 THEN DO;
  DECLARE HASH H(DATASET:"MATRIX",ORDERED:"D", HASHEXP:16);
  DECLARE HITER HI("H");
  H.DEFINEKEY("MATRIXTABLE","LO_DATE","LO_FREQ","LO_AMOUNT");
  H.DEFINEDATA("MATRIXTABLE","LO_DATE","LO_FREQ","LO_AMOUNT","INTRATE");
  H.DEFINEDONE();
  Call missing(MATRIXTABLE,LO_DATE,LO_FREQ,LO_AMOUNT,INTRATE) ;
  /* add addtional stops for up/down without error getting there  */
  Matrixtable=-1  ;lo_date='01jan1800'd; Lo_Freq=0; lo_amount=-1; RC = H.ADD();
  Matrixtable=9999;lo_date='01jan2200'd; Lo_Freq=0; lo_amount=-1; RC = H.ADD();
END;
format LO_date date. ;
  
IF _N_ = 1 THEN RC = HI.FIRST();
DO WHILE (RC = 0 AND INT_RATE = .);
   IF MATRIXTABLE = TABLE & LO_DATE = EFFECTIVEDATE    THEN DO;
     Do While (  LO_FREQ > FREQ ) ;
       RC = HI.NEXT();
     end ;
     Do While (   LO_AMOUNT > AMOUNT  & LO_DATE = EFFECTIVEDATE  ) ;
       RC = HI.NEXT();
     end ;
     INT_RATE = INTRATE; 
  END;  
  Else Do;
    RC = HI.NEXT();
  END;  
END;
OUTPUT ;
DO WHILE (LO_DATE <= EFFECTIVEDATE & RC = 0 );
   RC = HI.PREV();  /* get back to starting point of date - previous date */
end;
run;

---->-- ja karman --<-----
Scott_Mitchell
Quartz | Level 8

Hi Jaap,

The HASH approach was outstanding and gets the results that I want in a majority of occasions and I have made a tweak or 2 where appropriate.

I have one problem in the original example I had a LO_AMOUNT which was equal to 0 when in reality the lowest LO_AMOUNT is actually 1 (I hadn't been able to see the actual data until now and had to make the assumption that the lowest value would be 0).  If I come across an AMOUNT of 0 then the result displayed is  Matrixtable=-1  ;lo_date='01jan1800'd; Lo_Freq=0; lo_amount=-1 as expected.  The problem then appears to stop the look up for subsequent observations.

I have made several changes to the code to counter this including reading the hash from the beginning again and creating an if condition which creates an INT_RATE of 0 without processing the DO WHILE loop, but I am unable to get the desired outcome.

My code appears as follows:

DATA MATRIX;
FORMAT LO_DATE DATE9.;
INFILE DATALINES;
INPUT LO_AMOUNT;
DO MATRIXTABLE = 81,85,100;
DO LO_DATE = "01JAN1980"D TO "05SEP2014"D;
  DO LO_FREQ = 1,12,40,60;
   INTRATE = RANUNI(0);
   OUTPUT;
  END;
END;
END;
DATALINES;
1
1000
5000
10000
100000
1000000
;
RUN;

PROC SORT DATA=MATRIX;
BY LO_DATE;
RUN;

/*Outputting only every second date grouping to synthesize missing LO_DATES */

DATA MATRIX;
SET MATRIX;
BY LO_DATE;
IF FIRST.LO_DATE THEN GROUP +1;
IF FLOOR(GROUP/2) = GROUP/2 THEN OUTPUT;;
RUN;

DATA HISTORYTRANS;
ARRAY VARS $20 VAR1-VAR40;
FORMAT EFFECTIVEDATE DATE9.;
DO TABLE = 81,85,100;
DO EFFECTIVEDATE = "01JAN1980"D TO "05SEP2014"D BY CEIL(RANUNI(0)*100);
  DO FREQ = 1 TO 60 BY CEIL(RANUNI(0)*60);
   DO AMOUNT = 0 TO 1000000 BY CEIL(RANUNI(0)*1000000);
    DO I = 1 TO 40;
     VARS{I} = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
    END;
    IF RANUNI(0) < .20 THEN OUTPUT;
   END;
  END;
END;
END;
RUN;


proc sort data=matrix;
by descending matrixtable descending lo_Date descending lo_freq descending lo_amount;
run;


proc sort data=HISTORYTRANS  ;by descending table descending EffectiveDate descending freq descending amount  ; run;

DATA LOOKUP2;
retain RC ;
SET HISTORYTRANS (WHERE = (EFFECTIVEDATE > "01JAN1980"D));

IF _N_ = 1 THEN DO;
  DECLARE HASH H(DATASET:"MATRIX",ORDERED:"D", HASHEXP:16);
  DECLARE HITER HI("H");
  H.DEFINEKEY("MATRIXTABLE","LO_DATE","LO_FREQ","LO_AMOUNT");
  H.DEFINEDATA("MATRIXTABLE","LO_DATE","LO_FREQ","LO_AMOUNT","INTRATE");
  H.DEFINEDONE();
  Call missing(MATRIXTABLE,LO_DATE,LO_FREQ,LO_AMOUNT,INTRATE) ;
  /* add addtional stops for up/down without error getting there  */
  Matrixtable=-1  ;lo_date='01jan1800'd; Lo_Freq=0; lo_amount=-1; RC = H.ADD();
  Matrixtable=9999;lo_date='01jan2200'd; Lo_Freq=0; lo_amount=-1; RC = H.ADD();
END;
format LO_DATE date9. ;
  
DO WHILE (RC = 0 AND INT_RATE = .);
   IF MATRIXTABLE = TABLE   THEN DO;
     Do While (LO_DATE > EFFECTIVEDATE ) ;
       RC = HI.NEXT();
     end ;
     Do While (LO_FREQ > FREQ ) ;
       RC = HI.NEXT();
     end ;
     Do While (LO_AMOUNT > AMOUNT) ;
       RC = HI.NEXT();
     end ;
     INT_RATE = INTRATE; 
  END;  
  Else Do;
    RC = HI.NEXT();
  END;  
END;
OUTPUT ;
DO WHILE (LO_DATE <= EFFECTIVEDATE & RC = 0 );
   RC = HI.PREV();  /* get back to starting point of date - previous date */
END;
run;

Thank you again for all your help.

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 3541 views
  • 6 likes
  • 6 in conversation