BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Batman
Quartz | Level 8

I'd like to find the location of a value with in a set of values in an array.   I can do this with a do loop, something like the following whereTempVal is a random value.

 

/*Array values.   The lowest value of 0 is omitted*/

array v{3}  v1-v3 (10 20 30)

/*Set initial lower bound*/

L=0;

/*Array Location marker*/

Location=0; 

do i=1 to 3;

  /*create increment between upper and lower bound*/

  increment=v(i)-L;

  /*Set upper value*/

  H=L+incement;

  /*if Tempval is between upper and lower values note location in array*/

  if L lt TempVal le H then Location=i;

  /*if location has been specified, exit the loop*/

  if Location gt 0 then leave;

  /*Incement the lower value*/

  L+increment;

end;

 

However, this method can me time consuming with a big array and millions of observations.   I'm wondering if there is a function that could do the same thing without needing the loop.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Question:  is your array of values constant throughout all the observations, and only TEMPVAL changes?  If so then hash is your huckleberry.  Let's say dataset HAVE has variable tempval ,  and you have a constant list of values to put into array VALS:

 

data have;
  input tempval @@;
datalines;
1 5 22 31 35 44 101 200 432
run;

data want;

  array vals {5} _temporary_ (10,  31,42,101,680);

  if _n_=1 then do;
     length val i 8;
     declare hash h(ordered:'a');
       h.definekey('val');
       h.definedata('val','I');
       h.definedone();
     declare hiter hi ('h');

     /* add value of zero with index of zero to hash h*/
     rc=h.add(key:0,data:0,data:0);  

     /* Add all the other values to H, indexing them as 1, 2, 3, ... */
     do I=1 to n(of vals{*});        
        rc=h.add(key:vals{I},data:vals{I},data:i);
     end;
   end;

  set have;
  rc=h.find(key:tempval);
  if rc^=0 then do;	   
    rc=h.add(key:tempval,data:.,data:.);  /* insert tempval into hash h */
    rc=hi.setcur(key:tempval);  /* point the iterator at the tempval item*/
    rc=hi.next();               /* retrieve the next (pre-existing) item*/
    rc=h.remove(key:tempval);   /* remove TEMPVAL item */
  end;
  drop i rc;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
Patrick
Opal | Level 21

If you're dealing with a lot of array elements then the only way I can currently think of to speed things up is to code for a faster search algorithm. 

 

You could for example test if a value is within the lower and upper bound of the array, then within the lower bound and the middle of the array.... and so on. 

 

If the values for your array elements are constant within the whole data step and especially if there is some logic to the values (ie. 10, 20, 30, ...) then there could also be other ways to achieve what you want. Tell us a bit more what you have and what you want and ideally post a data step which creates representative sample data.

Batman
Quartz | Level 8

Unfortunately, they aren't constant.  

 

Let's say you have array values 18.5, 37, 84.    I want to the location of a set of values within the array, with 0 being the lower bound before 18.5.    Let's say we are checing are as follows

 

Obs#    TempVal

1           15

2           63

 

The Do Loop I posted earlier would locate observation 1 between 0 and 18.5 (Location=1) and observation 2 between 37 and 84 (Location=3).

 

Does that help?

ChrisNZ
Tourmaline | Level 20

Like this?

A couple of iterations is enough to find the interval.

A small cost to create a few arrays when starting pays off in no time.

 


data VALUES;
  do I=1 to 1000;
    A=int(ranuni(0)*1000000); %* interval limits;
    B=int(ranuni(0)*1000000); %* values to slot in;
    output;                       
  end;
run;       

proc sql noprint;
  select A into :VALUES separated by ' ' from VALUES order by A;
quit;      %put &=values;

%macro floor(num,round);         %* Floor() has just one parameter booo !;  
  round(&num,&round)-ifn(round(&num,&round)>&num, &round, 0)
%mend;

data WANT;
  array V[1000] _temporary_ (&values.);  %* array of values;
  array V1_[1000] _temporary_ ;  %* array of values rounded to 1e1;
  array V2_[1000] _temporary_ ;  %* array of values rounded to 1e2;
  array V3_[1000] _temporary_ ;  %* array of values rounded to 1e3;
  array V4_[1000] _temporary_ ;  %* array of values rounded to 1e4;
  array V5_[1000] _temporary_ ;  %* array of values rounded to 1e5;
  if _N_=1 then do I=1 to 1000;  %* populate rounded-value arrays ;
     V1_[I]  =%floor(V[I],1e1);   
     V2_[I]  =%floor(V[I],1e2);
     V3_[I]  =%floor(V[I],1e3);
     V4_[I]  =%floor(V[I],1e4);
     V5_[I]  =%floor(V[I],1e5);
  end;
  set VALUES(keep=B);            %* read data;
  POS1e5               =whichn(%floor(B,1e5), of V5_[*]); %* find closest   ;
  if POS1e5 then POS1e4=whichn(%floor(B,1e4), of V4_[*]); %* but lower value;
  if POS1e4 then POS1e3=whichn(%floor(B,1e3), of V3_[*]);
  if POS1e3 then POS1e2=whichn(%floor(B,1e2), of V2_[*]);
  if POS1e2 then POS1e1=whichn(%floor(B,1e1), of V1_[*]);
  POS=max(1,of POS1e1-POS1e5);
  %* Iterate array from POS (which() doesnot accept partial arrays booo !);   
  %* A couple of iterations at most is typically enough to find the interval;
  do I=POS to 999;    
    if V[I] <= B <= V[I+1] then do;
      ITER=I-POS;
      output; 
      leave;
    end;
  end;
  keep B I;
  *keep POS1e: POS ITER V1-V3 ; %* debug;
 run;

 

ballardw
Super User

I think you need to provide a bit more information about this. You show an example with 3 variables with uniform differences (10 20 30).  Do your values of interest always have such a uniform difference? Are the equivalents of the V array element always sorted?

 

I would think something like this may be slightly faster.:

do i = 1 to (dim(v) -1);

    if v[i] le temvar le v[i+1] then do:

        location =i;

        leave;

    end;

end;

 

If the intervals are always unform but change from record to record something similar to this may point a way.

data example;
   array v {3} v1-v3 (10 20 30);
   input tempvar;
   location = floor(tempvar/(v[2]-v[1])) * (v[1] le tempvar le v[3]);
datalines;
5
12
20
23
30
32
;
run;

If the intervals aren't the same from record to record or within any one array of values then there isn't going to much of anything to shorten a search unless your Tempvar is supposed to be one of the values in the array. If that is the case then you want the WHICHN function.

 

Astounding
PROC Star

You could speed up the logic by checking for the upper bound only:

 

do i=1 to dim(v) until (tempval < v{i});   /* perhaps <= rather than <  */

end;

 

Then add a couple of statements to compute LOCATION.  Yes, there really are no statements inside the DO loop.

 

Just to confirm ... it sounds like V1, V2, and V3 can change from one observation to the next.  If they were constant, other possibilities would come into play.

ChrisNZ
Tourmaline | Level 20

1- Make your array temporary to make it heaps faster

2-Are the array values evenly spaced?

mkeintz
PROC Star

Question:  is your array of values constant throughout all the observations, and only TEMPVAL changes?  If so then hash is your huckleberry.  Let's say dataset HAVE has variable tempval ,  and you have a constant list of values to put into array VALS:

 

data have;
  input tempval @@;
datalines;
1 5 22 31 35 44 101 200 432
run;

data want;

  array vals {5} _temporary_ (10,  31,42,101,680);

  if _n_=1 then do;
     length val i 8;
     declare hash h(ordered:'a');
       h.definekey('val');
       h.definedata('val','I');
       h.definedone();
     declare hiter hi ('h');

     /* add value of zero with index of zero to hash h*/
     rc=h.add(key:0,data:0,data:0);  

     /* Add all the other values to H, indexing them as 1, 2, 3, ... */
     do I=1 to n(of vals{*});        
        rc=h.add(key:vals{I},data:vals{I},data:i);
     end;
   end;

  set have;
  rc=h.find(key:tempval);
  if rc^=0 then do;	   
    rc=h.add(key:tempval,data:.,data:.);  /* insert tempval into hash h */
    rc=hi.setcur(key:tempval);  /* point the iterator at the tempval item*/
    rc=hi.next();               /* retrieve the next (pre-existing) item*/
    rc=h.remove(key:tempval);   /* remove TEMPVAL item */
  end;
  drop i rc;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2997 views
  • 0 likes
  • 6 in conversation