Solved: Re: Similar to LOC, but want index of nearest lesser value

nicholasbromley · Posted 03-29-2022 11:23 AM

I am comparing arrays and need to take values from array A and find the index of the matching value or next lower value in array B. B is ordered if that helps the solution.

So using LOC would give the the index of the exact match, but if there isn't an exact match, I want the next lower value.

A = {10, 3, 19}

B= {1, 3, 7, 42}

Wanted Result = {3, 2, 3}

Rick_SAS · Posted 03-29-2022 11:39 AM

OK, then you want the largest index of B for which B <= A[i] for each element of A.

This might be empty, so you also need to handle that case:

proc iml;
A = {10, 3, 19, 0};
B= {1, 3, 7, 42};  /* assume B is sorted in increasing order */

Result = j(nrow(A), 1, .);
do i = 1 to nrow(A);
   idx = loc(B <= A[i]);
   if ncol(idx) > 0 then            /* did we find any? */
      Result[i] = idx[ ncol(idx) ]; /* keep the largest */
end;

print A Result;

View solution in original post

Rick_SAS · Posted 03-29-2022 11:39 AM

OK, then you want the largest index of B for which B <= A[i] for each element of A.

This might be empty, so you also need to handle that case:

proc iml;
A = {10, 3, 19, 0};
B= {1, 3, 7, 42};  /* assume B is sorted in increasing order */

Result = j(nrow(A), 1, .);
do i = 1 to nrow(A);
   idx = loc(B <= A[i]);
   if ncol(idx) > 0 then            /* did we find any? */
      Result[i] = idx[ ncol(idx) ]; /* keep the largest */
end;

print A Result;

IanWakeling · Posted 03-29-2022 12:26 PM

I think you can avoid the need to check for an empty result if you vectorize it as follows:

A = {10, 3, 19} ;
B= {1, 3, 7, 42};
result = (t(B) <= repeat(A,1,nrow(B)))[,+];
print result;

Rick_SAS · Posted 03-29-2022 12:40 PM

Yes and no. Since 0 is not a valid index, you still have to check if you intend to use these values to index into the B vector.

I thought about vectorization. It requires more memory instead of the loop, so it really depends on the size of these vectors. Here's the code that I wrote but did not post:

/* replace with loops with matrix computations (requires more memory */
MA = repeat(A`, nrow(B), 1);  /* j_th column is A[j] */
MB = repeat(B, 1, nrow(A));   /* i_th row is B[i] */
L = (MB <= MA);               /* 1 iff B[i] <= A[j] */
print L;
Result = L[+,];               /* sum the elements for which B[i] <= A[j] */
print Result;

IanWakeling · Posted 03-30-2022 04:06 AM

Indeed size matters, so you need to be aware that nrow(A)*nrow(B) is not too large if vectorization is to be used. I compared

result = (t(B) <= repeat(A,1,nrow(B)))[,+];

to

result = (repeat(t(B),nrow(A),1) <= repeat(A,1,nrow(B)))[,+];

using FULLSTIMER and I am seeing that the latter uses about 50% more memory for a moderately large problem (10000 x 32), so there is some saving comparing vector to matrix, over matrix to matrix.

nicholasbromley · Posted 03-29-2022 03:53 PM

Thank you! Hours spent on this, but alot of learning...

nicholasbromley · Posted 04-12-2022 02:50 PM

@Rick_SAS so the original requirement shifted slightly, wondering if it can be a slight mod to this. I am taking these index values and using them to find the value in another vector at those indecies. Now, the requirement is if there isn't an exact match, use the higher and lower value in the other vector.

So for example, since 10 and 19 are not exact matches, I would average the 3rd and 4th values in the other vector. Can you please help with that?

Rick_SAS · Posted 04-12-2022 03:26 PM

I suspect you can solve the new problem if you give it some effort.

There are two "edge cases" that you need to worry about because there are no "adjacent values" to average:

What happens if an element of A is less than all elements of B? Is the result missing? Or maybe the 1st element of the Other vector?
What happens if an element of A is greater all elements of B? Is the result missing? Or maybe the last element of the Other vector?

In the program below, I've coded one way to handle these edge cases, but you might want to ask your client what they prefer.

proc iml;
A = {10, 3, 19, 0, 45};
B= {1, 3, 7, 42};  /* assume B is sorted in increasing order */
Other = {100, 200, 300, 400, 500};  /* assume Other is same size as B */
nB = nrow(B);
nA = nrow(A);

Result = j(nA, 1, .);
do i = 1 to nA;
   idx = loc(B = A[i]);
   if ncol(idx) > 0 then do;     /* did we find an exact match? */
      Result[i] = Other[idx];    /* if so, index into the Other vec */
   end;
   else do;                      /* not an exact match; average adjacent vals */
      idx = loc(B <= A[i]);
      /* 3 cases: EMPTY, the last index, or valid index that is not last */
      if ncol(idx)=0 then  /* A[i] < all elements of B */
         Result[i] = .;    /* or you could use Other[1] instead */
      else do;
         if all(idx < nB) then do; /* average the Other elements */
            j = idx[ ncol(idx) ];
            Result[i] = (Other[j] + Other[j+1]) / 2;
         end;
         else 
            Result[i] = Other[nB]; /* or missing? */
      end;
   end;
end;

print A Result;

Registration is open