BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nicholasbromley
Calcite | Level 5

I am comparing arrays and need to take values from array A and find the index of the matching value or next lower value in array B.  B is ordered if that helps the solution.  

 

So using LOC would give the the index of the exact match, but if there isn't an exact match, I want the next lower value.  

 

A = {10, 3, 19} 

B= {1, 3, 7, 42}

 

Wanted Result = {3, 2, 3}

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

OK, then you want the largest index of B for which B <= A[i] for each element of A.

This might be empty, so you also need to handle that case:

proc iml;
A = {10, 3, 19, 0};
B= {1, 3, 7, 42};  /* assume B is sorted in increasing order */

Result = j(nrow(A), 1, .);
do i = 1 to nrow(A);
   idx = loc(B <= A[i]);
   if ncol(idx) > 0 then            /* did we find any? */
      Result[i] = idx[ ncol(idx) ]; /* keep the largest */
end;

print A Result;

View solution in original post

7 REPLIES 7
Rick_SAS
SAS Super FREQ

OK, then you want the largest index of B for which B <= A[i] for each element of A.

This might be empty, so you also need to handle that case:

proc iml;
A = {10, 3, 19, 0};
B= {1, 3, 7, 42};  /* assume B is sorted in increasing order */

Result = j(nrow(A), 1, .);
do i = 1 to nrow(A);
   idx = loc(B <= A[i]);
   if ncol(idx) > 0 then            /* did we find any? */
      Result[i] = idx[ ncol(idx) ]; /* keep the largest */
end;

print A Result;
IanWakeling
Barite | Level 11

I think you can avoid the need to check for an empty result if you vectorize it as follows:

 

A = {10, 3, 19} ;
B= {1, 3, 7, 42};
result = (t(B) <= repeat(A,1,nrow(B)))[,+];
print result;
Rick_SAS
SAS Super FREQ

Yes and no. Since 0 is not a valid index, you still have to check if you intend to use these values to index into the B vector.

 

I thought about vectorization. It requires more memory instead of the loop, so it really depends on the size of these vectors. Here's the code that I wrote but did not post:

 

/* replace with loops with matrix computations (requires more memory */
MA = repeat(A`, nrow(B), 1);  /* j_th column is A[j] */
MB = repeat(B, 1, nrow(A));   /* i_th row is B[i] */
L = (MB <= MA);               /* 1 iff B[i] <= A[j] */
print L;
Result = L[+,];               /* sum the elements for which B[i] <= A[j] */
print Result;
IanWakeling
Barite | Level 11

Indeed size matters, so you need to be aware that nrow(A)*nrow(B) is not too large if vectorization is to be used.  I compared

result = (t(B) <= repeat(A,1,nrow(B)))[,+];

to

result = (repeat(t(B),nrow(A),1) <= repeat(A,1,nrow(B)))[,+];

using FULLSTIMER and I am seeing that the latter uses about 50% more memory for a moderately large problem (10000 x 32), so there is some saving comparing vector to matrix, over matrix to matrix.

 

 

nicholasbromley
Calcite | Level 5

Thank you! Hours spent on this, but alot of learning...

nicholasbromley
Calcite | Level 5

@Rick_SAS so the original requirement shifted slightly, wondering if it can be a slight mod to this.  I am taking these index values and using them to find the value in another vector at those indecies.   Now, the requirement is if there isn't an exact match, use the higher and lower value in the other vector.

 

So for example, since 10 and 19 are not exact matches, I would average the 3rd and 4th values in the other vector.   Can you please help with that? 

Rick_SAS
SAS Super FREQ

I suspect you can solve the new problem if you give it some effort. 

There are two "edge cases" that you need to worry about because there are no "adjacent values" to average:

  1. What happens if an element of A is less than all elements of B? Is the result missing? Or maybe the 1st element of the Other vector?
  2. What happens if an element of A is greater all elements of B? Is the result missing? Or maybe the last element of the Other vector?

In the program below, I've coded one way to handle these edge cases, but you might want to ask your client what they prefer.

 

proc iml;
A = {10, 3, 19, 0, 45};
B= {1, 3, 7, 42};  /* assume B is sorted in increasing order */
Other = {100, 200, 300, 400, 500};  /* assume Other is same size as B */
nB = nrow(B);
nA = nrow(A);

Result = j(nA, 1, .);
do i = 1 to nA;
   idx = loc(B = A[i]);
   if ncol(idx) > 0 then do;     /* did we find an exact match? */
      Result[i] = Other[idx];    /* if so, index into the Other vec */
   end;
   else do;                      /* not an exact match; average adjacent vals */
      idx = loc(B <= A[i]);
      /* 3 cases: EMPTY, the last index, or valid index that is not last */
      if ncol(idx)=0 then  /* A[i] < all elements of B */
         Result[i] = .;    /* or you could use Other[1] instead */
      else do;
         if all(idx < nB) then do; /* average the Other elements */
            j = idx[ ncol(idx) ];
            Result[i] = (Other[j] + Other[j+1]) / 2;
         end;
         else 
            Result[i] = Other[nB]; /* or missing? */
      end;
   end;
end;

print A Result;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 7 replies
  • 1412 views
  • 5 likes
  • 3 in conversation