DATA Step, Macro, Functions and more

A quick method to determine if a value is between two other values

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

A quick method to determine if a value is between two other values

[ Edited ]

I'd like to find the location of a value with in a set of values in an array.   I can do this with a do loop, something like the following whereTempVal is a random value.

 

/*Array values.   The lowest value of 0 is omitted*/

array v{3}  v1-v3 (10 20 30)

/*Set initial lower bound*/

L=0;

/*Array Location marker*/

Location=0; 

do i=1 to 3;

  /*create increment between upper and lower bound*/

  increment=v(i)-L;

  /*Set upper value*/

  H=L+incement;

  /*if Tempval is between upper and lower values note location in array*/

  if L lt TempVal le H then Location=i;

  /*if location has been specified, exit the loop*/

  if Location gt 0 then leave;

  /*Incement the lower value*/

  L+increment;

end;

 

However, this method can me time consuming with a big array and millions of observations.   I'm wondering if there is a function that could do the same thing without needing the loop.

 


Accepted Solutions
Solution
‎03-08-2017 07:05 PM
Trusted Advisor
Posts: 1,018

Re: A quick method to determine if a value is between two other values

Question:  is your array of values constant throughout all the observations, and only TEMPVAL changes?  If so then hash is your huckleberry.  Let's say dataset HAVE has variable tempval ,  and you have a constant list of values to put into array VALS:

 

data have;
  input tempval @@;
datalines;
1 5 22 31 35 44 101 200 432
run;

data want;

  array vals {5} _temporary_ (10,  31,42,101,680);

  if _n_=1 then do;
     length val i 8;
     declare hash h(ordered:'a');
       h.definekey('val');
       h.definedata('val','I');
       h.definedone();
     declare hiter hi ('h');

     /* add value of zero with index of zero to hash h*/
     rc=h.add(key:0,data:0,data:0);  

     /* Add all the other values to H, indexing them as 1, 2, 3, ... */
     do I=1 to n(of vals{*});        
        rc=h.add(key:vals{I},data:vals{I},data:i);
     end;
   end;

  set have;
  rc=h.find(key:tempval);
  if rc^=0 then do;	   
    rc=h.add(key:tempval,data:.,data:.);  /* insert tempval into hash h */
    rc=hi.setcur(key:tempval);  /* point the iterator at the tempval item*/
    rc=hi.next();               /* retrieve the next (pre-existing) item*/
    rc=h.remove(key:tempval);   /* remove TEMPVAL item */
  end;
  drop i rc;
run;

View solution in original post


All Replies
Respected Advisor
Posts: 4,173

Re: A quick method to determine if a value is between two other values

[ Edited ]

If you're dealing with a lot of array elements then the only way I can currently think of to speed things up is to code for a faster search algorithm. 

 

You could for example test if a value is within the lower and upper bound of the array, then within the lower bound and the middle of the array.... and so on. 

 

If the values for your array elements are constant within the whole data step and especially if there is some logic to the values (ie. 10, 20, 30, ...) then there could also be other ways to achieve what you want. Tell us a bit more what you have and what you want and ideally post a data step which creates representative sample data.

Occasional Contributor
Posts: 9

Re: A quick method to determine if a value is between two other values

Unfortunately, they aren't constant.  

 

Let's say you have array values 18.5, 37, 84.    I want to the location of a set of values within the array, with 0 being the lower bound before 18.5.    Let's say we are checing are as follows

 

Obs#    TempVal

1           15

2           63

 

The Do Loop I posted earlier would locate observation 1 between 0 and 18.5 (Location=1) and observation 2 between 37 and 84 (Location=3).

 

Does that help?

PROC Star
Posts: 1,759

Re: A quick method to determine if a value is between two other values

[ Edited ]

Like this?

A couple of iterations is enough to find the interval.

A small cost to create a few arrays when starting pays off in no time.

 


data VALUES;
  do I=1 to 1000;
    A=int(ranuni(0)*1000000); %* interval limits;
    B=int(ranuni(0)*1000000); %* values to slot in;
    output;                       
  end;
run;       

proc sql noprint;
  select A into :VALUES separated by ' ' from VALUES order by A;
quit;      %put &=values;

%macro floor(num,round);         %* Floor() has just one parameter booo !;  
  round(&num,&round)-ifn(round(&num,&round)>&num, &round, 0)
%mend;

data WANT;
  array V[1000] _temporary_ (&values.);  %* array of values;
  array V1_[1000] _temporary_ ;  %* array of values rounded to 1e1;
  array V2_[1000] _temporary_ ;  %* array of values rounded to 1e2;
  array V3_[1000] _temporary_ ;  %* array of values rounded to 1e3;
  array V4_[1000] _temporary_ ;  %* array of values rounded to 1e4;
  array V5_[1000] _temporary_ ;  %* array of values rounded to 1e5;
  if _N_=1 then do I=1 to 1000;  %* populate rounded-value arrays ;
     V1_[I]  =%floor(V[I],1e1);   
     V2_[I]  =%floor(V[I],1e2);
     V3_[I]  =%floor(V[I],1e3);
     V4_[I]  =%floor(V[I],1e4);
     V5_[I]  =%floor(V[I],1e5);
  end;
  set VALUES(keep=B);            %* read data;
  POS1e5               =whichn(%floor(B,1e5), of V5_[*]); %* find closest   ;
  if POS1e5 then POS1e4=whichn(%floor(B,1e4), of V4_[*]); %* but lower value;
  if POS1e4 then POS1e3=whichn(%floor(B,1e3), of V3_[*]);
  if POS1e3 then POS1e2=whichn(%floor(B,1e2), of V2_[*]);
  if POS1e2 then POS1e1=whichn(%floor(B,1e1), of V1_[*]);
  POS=max(1,of POS1e1-POS1e5);
  %* Iterate array from POS (which() doesnot accept partial arrays booo !);   
  %* A couple of iterations at most is typically enough to find the interval;
  do I=POS to 999;    
    if V[I] <= B <= V[I+1] then do;
      ITER=I-POS;
      output; 
      leave;
    end;
  end;
  keep B I;
  *keep POS1e: POS ITER V1-V3 ; %* debug;
 run;

 

Super User
Posts: 11,343

Re: A quick method to determine if a value is between two other values

I think you need to provide a bit more information about this. You show an example with 3 variables with uniform differences (10 20 30).  Do your values of interest always have such a uniform difference? Are the equivalents of the V array element always sorted?

 

I would think something like this may be slightly faster.:

do i = 1 to (dim(v) -1);

    if v[i] le temvar le v[i+1] then do:

        location =i;

        leave;

    end;

end;

 

If the intervals are always unform but change from record to record something similar to this may point a way.

data example;
   array v {3} v1-v3 (10 20 30);
   input tempvar;
   location = floor(tempvar/(v[2]-v[1])) * (v[1] le tempvar le v[3]);
datalines;
5
12
20
23
30
32
;
run;

If the intervals aren't the same from record to record or within any one array of values then there isn't going to much of anything to shorten a search unless your Tempvar is supposed to be one of the values in the array. If that is the case then you want the WHICHN function.

 

Super User
Posts: 5,497

Re: A quick method to determine if a value is between two other values

You could speed up the logic by checking for the upper bound only:

 

do i=1 to dim(v) until (tempval < v{i});   /* perhaps <= rather than <  */

end;

 

Then add a couple of statements to compute LOCATION.  Yes, there really are no statements inside the DO loop.

 

Just to confirm ... it sounds like V1, V2, and V3 can change from one observation to the next.  If they were constant, other possibilities would come into play.

PROC Star
Posts: 1,759

Re: A quick method to determine if a value is between two other values

1- Make your array temporary to make it heaps faster

2-Are the array values evenly spaced?

Solution
‎03-08-2017 07:05 PM
Trusted Advisor
Posts: 1,018

Re: A quick method to determine if a value is between two other values

Question:  is your array of values constant throughout all the observations, and only TEMPVAL changes?  If so then hash is your huckleberry.  Let's say dataset HAVE has variable tempval ,  and you have a constant list of values to put into array VALS:

 

data have;
  input tempval @@;
datalines;
1 5 22 31 35 44 101 200 432
run;

data want;

  array vals {5} _temporary_ (10,  31,42,101,680);

  if _n_=1 then do;
     length val i 8;
     declare hash h(ordered:'a');
       h.definekey('val');
       h.definedata('val','I');
       h.definedone();
     declare hiter hi ('h');

     /* add value of zero with index of zero to hash h*/
     rc=h.add(key:0,data:0,data:0);  

     /* Add all the other values to H, indexing them as 1, 2, 3, ... */
     do I=1 to n(of vals{*});        
        rc=h.add(key:vals{I},data:vals{I},data:i);
     end;
   end;

  set have;
  rc=h.find(key:tempval);
  if rc^=0 then do;	   
    rc=h.add(key:tempval,data:.,data:.);  /* insert tempval into hash h */
    rc=hi.setcur(key:tempval);  /* point the iterator at the tempval item*/
    rc=hi.next();               /* retrieve the next (pre-existing) item*/
    rc=h.remove(key:tempval);   /* remove TEMPVAL item */
  end;
  drop i rc;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 391 views
  • 0 likes
  • 6 in conversation