turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- A quick method to determine if a value is between ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2017 04:08 PM - edited 03-08-2017 04:10 PM

I'd like to find the location of a value with in a set of values in an array. I can do this with a do loop, something like the following whereTempVal is a random value.

/*Array values. The lowest value of 0 is omitted*/

array v{3} v1-v3 (10 20 30)

/*Set initial lower bound*/

L=0;

/*Array Location marker*/

Location=0;

do i=1 to 3;

/*create increment between upper and lower bound*/

increment=v(i)-L;

/*Set upper value*/

H=L+incement;

/*if Tempval is between upper and lower values note location in array*/

if L lt TempVal le H then Location=i;

/*if location has been specified, exit the loop*/

if Location gt 0 then leave;

/*Incement the lower value*/

L+increment;

end;

However, this method can me time consuming with a big array and millions of observations. I'm wondering if there is a function that could do the same thing without needing the loop.

Accepted Solutions

Solution

03-08-2017
07:05 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2017 05:42 PM

Question: is your array of values constant throughout all the observations, and only TEMPVAL changes? If so then hash is your huckleberry. Let's say dataset HAVE has variable tempval , and you have a constant list of values to put into array VALS:

```
data have;
input tempval @@;
datalines;
1 5 22 31 35 44 101 200 432
run;
data want;
array vals {5} _temporary_ (10, 31,42,101,680);
if _n_=1 then do;
length val i 8;
declare hash h(ordered:'a');
h.definekey('val');
h.definedata('val','I');
h.definedone();
declare hiter hi ('h');
/* add value of zero with index of zero to hash h*/
rc=h.add(key:0,data:0,data:0);
/* Add all the other values to H, indexing them as 1, 2, 3, ... */
do I=1 to n(of vals{*});
rc=h.add(key:vals{I},data:vals{I},data:i);
end;
end;
set have;
rc=h.find(key:tempval);
if rc^=0 then do;
rc=h.add(key:tempval,data:.,data:.); /* insert tempval into hash h */
rc=hi.setcur(key:tempval); /* point the iterator at the tempval item*/
rc=hi.next(); /* retrieve the next (pre-existing) item*/
rc=h.remove(key:tempval); /* remove TEMPVAL item */
end;
drop i rc;
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2017 04:19 PM - edited 03-08-2017 04:23 PM

If you're dealing with a lot of array elements then the only way I can currently think of to speed things up is to code for a faster search algorithm.

You could for example test if a value is within the lower and upper bound of the array, then within the lower bound and the middle of the array.... and so on.

If the values for your array elements are constant within the whole data step and especially if there is some logic to the values (ie. 10, 20, 30, ...) then there could also be other ways to achieve what you want. Tell us a bit more what you have and what you want and ideally post a data step which creates representative sample data.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2017 04:33 PM

Unfortunately, they aren't constant.

Let's say you have array values 18.5, 37, 84. I want to the location of a set of values within the array, with 0 being the lower bound before 18.5. Let's say we are checing are as follows

Obs# TempVal

1 15

2 63

The Do Loop I posted earlier would locate observation 1 between 0 and 18.5 (Location=1) and observation 2 between 37 and 84 (Location=3).

Does that help?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2017 06:31 PM - edited 03-08-2017 06:43 PM

Like this?

A couple of iterations is enough to find the interval.

A small cost to create a few arrays when starting pays off in no time.

```
data VALUES;
do I=1 to 1000;
A=int(ranuni(0)*1000000); %* interval limits;
B=int(ranuni(0)*1000000); %* values to slot in;
output;
end;
run;
proc sql noprint;
select A into :VALUES separated by ' ' from VALUES order by A;
quit; %put &=values;
%macro floor(num,round); %* Floor() has just one parameter booo !;
round(&num,&round)-ifn(round(&num,&round)>&num, &round, 0)
%mend;
data WANT;
array V[1000] _temporary_ (&values.); %* array of values;
array V1_[1000] _temporary_ ; %* array of values rounded to 1e1;
array V2_[1000] _temporary_ ; %* array of values rounded to 1e2;
array V3_[1000] _temporary_ ; %* array of values rounded to 1e3;
array V4_[1000] _temporary_ ; %* array of values rounded to 1e4;
array V5_[1000] _temporary_ ; %* array of values rounded to 1e5;
if _N_=1 then do I=1 to 1000; %* populate rounded-value arrays ;
V1_[I] =%floor(V[I],1e1);
V2_[I] =%floor(V[I],1e2);
V3_[I] =%floor(V[I],1e3);
V4_[I] =%floor(V[I],1e4);
V5_[I] =%floor(V[I],1e5);
end;
set VALUES(keep=B); %* read data;
POS1e5 =whichn(%floor(B,1e5), of V5_[*]); %* find closest ;
if POS1e5 then POS1e4=whichn(%floor(B,1e4), of V4_[*]); %* but lower value;
if POS1e4 then POS1e3=whichn(%floor(B,1e3), of V3_[*]);
if POS1e3 then POS1e2=whichn(%floor(B,1e2), of V2_[*]);
if POS1e2 then POS1e1=whichn(%floor(B,1e1), of V1_[*]);
POS=max(1,of POS1e1-POS1e5);
%* Iterate array from POS (which() doesnot accept partial arrays booo !);
%* A couple of iterations at most is typically enough to find the interval;
do I=POS to 999;
if V[I] <= B <= V[I+1] then do;
ITER=I-POS;
output;
leave;
end;
end;
keep B I;
*keep POS1e: POS ITER V1-V3 ; %* debug;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2017 04:34 PM

I think you need to provide a bit more information about this. You show an example with 3 variables with uniform differences (10 20 30). Do your values of interest always have such a uniform difference? Are the equivalents of the V array element always sorted?

I would think something like this may be slightly faster.:

do i = 1 to (dim(v) -1);

if v[i] le temvar le v[i+1] then do:

location =i;

leave;

end;

end;

If the intervals are always unform but change from record to record something similar to this may point a way.

data example; array v {3} v1-v3 (10 20 30); input tempvar; location = floor(tempvar/(v[2]-v[1])) * (v[1] le tempvar le v[3]); datalines; 5 12 20 23 30 32 ; run;

If the intervals aren't the same from record to record or within any one array of values then there isn't going to much of anything to shorten a search unless your Tempvar is supposed to be one of the values in the array. If that is the case then you want the WHICHN function.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2017 04:51 PM

You could speed up the logic by checking for the upper bound only:

do i=1 to dim(v) until (tempval < v{i}); /* perhaps <= rather than < */

end;

Then add a couple of statements to compute LOCATION. Yes, there really are no statements inside the DO loop.

Just to confirm ... it sounds like V1, V2, and V3 can change from one observation to the next. If they were constant, other possibilities would come into play.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2017 04:52 PM

1- Make your array temporary to make it heaps faster

2-Are the array values evenly spaced?

Solution

03-08-2017
07:05 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2017 05:42 PM

Question: is your array of values constant throughout all the observations, and only TEMPVAL changes? If so then hash is your huckleberry. Let's say dataset HAVE has variable tempval , and you have a constant list of values to put into array VALS:

```
data have;
input tempval @@;
datalines;
1 5 22 31 35 44 101 200 432
run;
data want;
array vals {5} _temporary_ (10, 31,42,101,680);
if _n_=1 then do;
length val i 8;
declare hash h(ordered:'a');
h.definekey('val');
h.definedata('val','I');
h.definedone();
declare hiter hi ('h');
/* add value of zero with index of zero to hash h*/
rc=h.add(key:0,data:0,data:0);
/* Add all the other values to H, indexing them as 1, 2, 3, ... */
do I=1 to n(of vals{*});
rc=h.add(key:vals{I},data:vals{I},data:i);
end;
end;
set have;
rc=h.find(key:tempval);
if rc^=0 then do;
rc=h.add(key:tempval,data:.,data:.); /* insert tempval into hash h */
rc=hi.setcur(key:tempval); /* point the iterator at the tempval item*/
rc=hi.next(); /* retrieve the next (pre-existing) item*/
rc=h.remove(key:tempval); /* remove TEMPVAL item */
end;
drop i rc;
run;
```