BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jakarman
Barite | Level 11

What I did is adding  error conditions.
- not going byond the current date in the inner loops 
- keeping the previous conditions  (reversed logic) as condition.

- only assiging int_rate when still all conditions are met
- recovery when something goes beyond scope an error (restart from scratch)

- error dataset for missed condtions.

As my session UE can behave badly going beyond limitations saved those in a permanent type. 

The whole idea is based on keeping searching just within that current date.
Knowing the power of factors and the ! function.

Your testdata is still running in the order off seconds on my machine. What is the result on your side?.

libname test "/folders/myfolders/test";

DATA test.MATRIX;
FORMAT LO_DATE DATE9.;
INFILE DATALINES;
INPUT LO_AMOUNT;
DO MATRIXTABLE = 81,85,100;
DO LO_DATE = "01JAN1980"D TO "05SEP2014"D;
  DO LO_FREQ = 1,12,40,60;
   INTRATE = RANUNI(0);
   OUTPUT;
  END;
END;
END;
DATALINES;
0
1000
5000
10000
100000
1000000
;
RUN;


DATA test.HISTORYTRANS (drop=i) ;
ARRAY VARS $20 VAR1-VAR40;
FORMAT EFFECTIVEDATE DATE9.;
DO TABLE = 81,85,100;
DO EFFECTIVEDATE = "01JAN1980"D TO "05SEP2014"D BY CEIL(RANUNI(0)*100);
  DO FREQ = 1 TO 60 BY CEIL(RANUNI(0)*60);
   DO AMOUNT = 100 TO 1000000 BY CEIL(RANUNI(0)*1000000);
    DO I = 1 TO 40;
     VARS{I} = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
    END;
    IF RANUNI(0) < .20 THEN OUTPUT;
   END;
  END;
END;
END;
RUN;

/*  -- */
proc sort data=test.matrix        ;by descending matrixtable descending lo_Date descending lo_freq descending lo_amount  ; run;

proc sort data=test.HISTORYTRANS  ;by descending table descending EffectiveDate descending freq descending amount  ; run;

DATA test.Lookup test.lookup_err;
retain RC ;
SET test.HISTORYTRANS ;

IF _N_ = 1 THEN DO;
  DECLARE HASH H(DATASET:"test.MATRIX",ORDERED:"D", HASHEXP:16);
  DECLARE HITER HI("H");
  H.DEFINEKEY("MATRIXTABLE","LO_DATE","LO_FREQ","LO_AMOUNT");
  H.DEFINEDATA("MATRIXTABLE","LO_DATE","LO_FREQ","LO_AMOUNT","INTRATE");
  H.DEFINEDONE();
  Call missing(MATRIXTABLE,LO_DATE,LO_FREQ,LO_AMOUNT,INTRATE) ;
  /* add addtional stops for up/down without error getting there  */
  Matrixtable=-1  ;lo_date='01jan1800'd; Lo_Freq=0; lo_amount=-1; RC = H.ADD();
  Matrixtable=9999;lo_date='01jan2200'd; Lo_Freq=0; lo_amount=-1; RC = H.ADD();
END;
format LO_date date. ;

IF (_N_ = 1 | RC not = 0  ) THEN RC = HI.FIRST();
DO WHILE (RC = 0 AND INT_RATE = .);
   IF MATRIXTABLE = TABLE & LO_DATE = EFFECTIVEDATE    THEN DO;
     Do While (  RC=0 & LO_FREQ > FREQ  & LO_DATE = EFFECTIVEDATE ) ;
       RC = HI.NEXT();
     end ;
     Do While (  RC=0  & LO_AMOUNT > AMOUNT  & LO_DATE = EFFECTIVEDATE  & LO_FREQ <=  Freq) ;
       RC = HI.NEXT();
     end ;
     if ( RC = 0 & MATRIXTABLE = TABLE & LO_DATE = EFFECTIVEDATE & LO_FREQ <=  Freq & LO_AMOUNT <= AMOUNT ) then INT_RATE = INTRATE; 
  END;  
  Else Do;
    RC = HI.NEXT();
  END;  
END;
If (LO_Date = Effectivedate) then OUTPUT test.lookup ;
else output test.lookup_err;

IF ( RC not = 0 | LO_date < '01feb1800'd  ) THEN Do;
  put "reset search: " _N_ RC lo_date ; 
  RC = HI.FIRST();
end;
else DO WHILE (LO_DATE <= EFFECTIVEDATE & RC = 0 );
   RC = HI.PREV();  /* get back to starting point of date - previous date */
end;

run;

---->-- ja karman --<-----
KachiM
Rhodochrosite | Level 12

A newer hash approach is made. The Lo_date varies from 7305 to 19971.

Three temporary arrays are used to hold the values of matrixtable,

Lo_freq, and Lo_amount.

The variables from HISTORYTRANS are scanned with the values of arrays

to determine the first starting values for lookups. The index of the

each array for starting values is used in the subsequent do-loops and the index

is decremented if there is no hit. When there is a hit the output data set is

written and the do-loop is skipped to read the next record.

In case the number of values in the temporary arrays are very large, a binary

search can be used to reduce the search time.

This approach avoids the run-time for sorting the data sets.

DATA MATRIX;

FORMAT LO_DATE DATE9.;

INFILE DATALINES;

INPUT LO_AMOUNT;

DO MATRIXTABLE = 81,85,100;

DO LO_DATE = "01JAN1980"D TO "05SEP2014"D;

  DO LO_FREQ = 1,12,40,60;

   INTRATE = RANUNI(0);

   OUTPUT;

  END;

END;

END;

DATALINES;

1

1000

5000

10000

100000

1000000

;

RUN;

DATA HISTORYTRANS (drop=i) ;

*ARRAY VARS $20 VAR1-VAR40;

FORMAT EFFECTIVEDATE DATE9.;

DO TABLE = 81,85,100;

DO EFFECTIVEDATE = "01JAN1980"D TO "05SEP2014"D BY CEIL(RANUNI(0)*100);

  DO FREQ = 1 TO 60 BY CEIL(RANUNI(0)*60);

   DO AMOUNT = 100 TO 1000000 BY CEIL(RANUNI(0)*1000000);

/*    DO I = 1 TO 40;*/

/*     VARS{I} = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";*/

/*    END;*/

    IF RANUNI(0) < .20 THEN OUTPUT;

   END;

  END;

END;

END;

RUN;

data need;

array t[3] _temporary_ (81, 85, 100);

array f[4] _temporary_ (1,12,40,60);

array a[6] _temporary_ (1, 1000, 5000, 10000, 100000, 1000000);

format matrixtable table lo_date effectivedate lo_freq freq lo_amount amount ;

   if _n_ = 1 then do;

      retain max_date 19971 ;

      if 0 then set matrix;

      declare hash h(dataset:'matrix', ordered:'Y', hashexp:16);

      h.definekey('matrixtable','Lo_date','Lo_freq','Lo_amount');

      h.definedata('intrate');

      h.definedone();

   end;

   set historytrans;

   do i = dim(t) to 1 by -1;

      if t <= table then do; m_table = i; leave; end;

   end;

   Lo_date = ifN(effectivedate > max_date, max_date, effectivedate);

   do i = dim(f) to 1 by -1 ;

      if f <= freq then do; m_freq = i; leave; end;

   end;

   do i = dim(a) to 1 by -1;

      if a <= amount then do; m_amount = i; leave; end;

   end;

   rc = 0;

   do matrixtable = t[m_table] to t[1] by -1 while(rc = 0);

      do Lo_date = Lo_date to 7305 by -1 while(rc = 0);

         do Lo_freq = f[m_freq] to f[1] by -1 while(rc = 0);

            do Lo_amount = a[m_amount] to a[1] by -1 while(rc = 0);

               rc = h.find();

               if rc = 0 then do; rc = 1; output; end;

            end;

         end;

      end;

   end;

keep matrixtable table lo_date effectivedate lo_freq freq lo_amount amount ;

run;

KachiM
Rhodochrosite | Level 12

In my Post I missed to place the most important variable which is sought.

Add Variable, INTRATE, into 2 statements as:

format matrixtable table lo_date effectivedate lo_freq freq lo_amount amount intrate;

keep matrixtable table lo_date effectivedate lo_freq freq lo_amount amount intrate;

gergely_batho
SAS Employee

Hi All, maybe there’s a place here for a little (self)marketing. Smiley Happy

This problem involves looking up values in a table. The difficulty is, that there are no exact keys, instead we have intervals in the lookup table.

Most of the solutions use a hash table, and traverse it using many hiter.next(), hiter.prev() or hash.find() method calls.

If you think that one simple method call that supports a direct "find greater than" would be useful in a future SAS release, please vote here: https://communities.sas.com/ideas/1613

By the way, I like Muthia’s method of handling many “non-exact” keys.

Thanks.

KachiM
Rhodochrosite | Level 12

In my earlier reply, I used the following 4 do-loops to search for

matching records on the lower direction of each of the 4 variables.

For example, matrixtable variable, is decremented by unity. It

is not efficient for run-time.

   do matrixtable = t[m_table] to t[1] by -1 while(rc = 0);

      do Lo_date = Lo_date to 7305 by -1 while(rc = 0);

         do Lo_freq = f[m_freq] to f[1] by -1 while(rc = 0);

            do Lo_amount = a[m_amount] to a[1] by -1 while(rc = 0);

               rc = h.find();

               if rc = 0 then do; rc = 1; output; end;

            end;

         end;

      end;

   end;

A better way is to use:

          do m_table = m_table to 1 by -1 while(rc = 0);

    matrixtable = t[m_table];

and similarly for Lo_freq and Lo_amount.

The Lo_date variable takes continous values and so needs no change.

For the convenience of reading, I produce the revised program as:

The two input data sets used in the program are same as in my

previous reply.

/** ================================================================================ **/

/**  Loop Modification for efficiency                                                **/

/**================================================================================= **/

data need;

array t[3] _temporary_ (81, 85, 100);

array f[4] _temporary_ (1,12,40,60);

array a[6] _temporary_ (1, 1000, 5000, 10000, 100000, 1000000);

format matrixtable table lo_date effectivedate lo_freq freq lo_amount amount intrate;

   if _n_ = 1 then do;

      retain max_date 19971 ;

      if 0 then set matrix;

      declare hash h(dataset:'matrix', ordered:'Y', hashexp:16);

      h.definekey('matrixtable','Lo_date','Lo_freq','Lo_amount');

      h.definedata('intrate');

      h.definedone();

   end;

   set historytrans;

   do i = dim(t) to 1 by -1;

      if t <= table then do; m_table = i; leave; end;

   end;

   Lo_date = ifN(effectivedate > max_date, max_date, effectivedate);

   do i = dim(f) to 1 by -1 ;

      if f <= freq then do; m_freq = i; leave; end;

   end;

   do i = dim(a) to 1 by -1;

      if a <= amount then do; m_amount = i; leave; end;

   end;

   rc = 0;

   do m_table = m_table to 1 by -1 while(rc = 0);

      matrixtable = t[m_table];

      do Lo_date = Lo_date to 7305 by -1 while(rc = 0);

         do m_freq = m_freq to 1 by -1 while(rc = 0);

            Lo_freq = f[m_freq];

            do m_amount = m_amount to 1 by -1 while(rc = 0);

               Lo_amount = a[m_amount];

               rc = h.find();

               if rc = 0 then do; rc = 1; output; end;

            end;

         end;

      end;

   end;

keep matrixtable table lo_date effectivedate lo_freq freq lo_amount amount intrate;

run;

This may not take much time to experiment with the competing program

and decide. I will be too happy if OP benefits(in run-time).

jakarman
Barite | Level 11

Gergely I updated you idea.  Tabel lookups are an everlasting topic for coding issues.

a/ SQL (full join),

b/ Formats,

c/ point=,

d/ hash and of

e/ course arrays (retained)

All of them behave different. There are many trying to compare them when they seem to be equal. More interesting is when they are offering different

This is a nice case study.  To get it more theoretical leaving the example as an example 

- matrix fixed values ,  test=eq

- dates  interval        ,   test=le 

- freq    interval         ,   test=ge

- amount variable     ,   test=le

Yes I know Chris:

- The matrix is defined as some fixed values but not how many of them there are. IT is an equal condition.

- has defined those dates as being all, This is also the approach to solve it by avoiding it and go for eq condition.

- The freq interval can be variable not having the same number and the same values for all dates

The amount variable is one leading to a new values dependent on one fixed and two variable intervals.

ad a/ A SQL full join suffers from the Cartesian product

ad b/ Defining formats is possible with one variable interval (date) not with two (freq)

ad c/ Is workable needing a predefined order in the lookup dataset for manual search (no indexing possible in the combination)

         Ordering the base dataset also makes an algoritme possible that minimizes IO/Cpu. 

         The descending order can be used with sorting to optimize the positioning. 

ad d/ The hashing looks the same c/ but lacks either sorting order when working sequential or not having a find_ee.

         In this case we had luck that a descending order for all keys was a logical fit.
ad e/ Working with a arrays will need a full load of all lookups like the hash at the start or like the formats as namings.

         The variability can be problematic when not knowing the number of matrixtables or one of the other intervals as max sizings in advance

---->-- ja karman --<-----

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 2211 views
  • 6 likes
  • 6 in conversation