I am comparing arrays and need to take values from array A and find the index of the matching value or next lower value in array B. B is ordered if that helps the solution.
So using LOC would give the the index of the exact match, but if there isn't an exact match, I want the next lower value.
A = {10, 3, 19}
B= {1, 3, 7, 42}
Wanted Result = {3, 2, 3}
OK, then you want the largest index of B for which B <= A[i] for each element of A.
This might be empty, so you also need to handle that case:
proc iml;
A = {10, 3, 19, 0};
B= {1, 3, 7, 42}; /* assume B is sorted in increasing order */
Result = j(nrow(A), 1, .);
do i = 1 to nrow(A);
idx = loc(B <= A[i]);
if ncol(idx) > 0 then /* did we find any? */
Result[i] = idx[ ncol(idx) ]; /* keep the largest */
end;
print A Result;
OK, then you want the largest index of B for which B <= A[i] for each element of A.
This might be empty, so you also need to handle that case:
proc iml;
A = {10, 3, 19, 0};
B= {1, 3, 7, 42}; /* assume B is sorted in increasing order */
Result = j(nrow(A), 1, .);
do i = 1 to nrow(A);
idx = loc(B <= A[i]);
if ncol(idx) > 0 then /* did we find any? */
Result[i] = idx[ ncol(idx) ]; /* keep the largest */
end;
print A Result;
I think you can avoid the need to check for an empty result if you vectorize it as follows:
A = {10, 3, 19} ;
B= {1, 3, 7, 42};
result = (t(B) <= repeat(A,1,nrow(B)))[,+];
print result;
Yes and no. Since 0 is not a valid index, you still have to check if you intend to use these values to index into the B vector.
I thought about vectorization. It requires more memory instead of the loop, so it really depends on the size of these vectors. Here's the code that I wrote but did not post:
/* replace with loops with matrix computations (requires more memory */
MA = repeat(A`, nrow(B), 1); /* j_th column is A[j] */
MB = repeat(B, 1, nrow(A)); /* i_th row is B[i] */
L = (MB <= MA); /* 1 iff B[i] <= A[j] */
print L;
Result = L[+,]; /* sum the elements for which B[i] <= A[j] */
print Result;
Indeed size matters, so you need to be aware that nrow(A)*nrow(B) is not too large if vectorization is to be used. I compared
result = (t(B) <= repeat(A,1,nrow(B)))[,+];
to
result = (repeat(t(B),nrow(A),1) <= repeat(A,1,nrow(B)))[,+];
using FULLSTIMER and I am seeing that the latter uses about 50% more memory for a moderately large problem (10000 x 32), so there is some saving comparing vector to matrix, over matrix to matrix.
Thank you! Hours spent on this, but alot of learning...
@Rick_SAS so the original requirement shifted slightly, wondering if it can be a slight mod to this. I am taking these index values and using them to find the value in another vector at those indecies. Now, the requirement is if there isn't an exact match, use the higher and lower value in the other vector.
So for example, since 10 and 19 are not exact matches, I would average the 3rd and 4th values in the other vector. Can you please help with that?
I suspect you can solve the new problem if you give it some effort.
There are two "edge cases" that you need to worry about because there are no "adjacent values" to average:
In the program below, I've coded one way to handle these edge cases, but you might want to ask your client what they prefer.
proc iml;
A = {10, 3, 19, 0, 45};
B= {1, 3, 7, 42}; /* assume B is sorted in increasing order */
Other = {100, 200, 300, 400, 500}; /* assume Other is same size as B */
nB = nrow(B);
nA = nrow(A);
Result = j(nA, 1, .);
do i = 1 to nA;
idx = loc(B = A[i]);
if ncol(idx) > 0 then do; /* did we find an exact match? */
Result[i] = Other[idx]; /* if so, index into the Other vec */
end;
else do; /* not an exact match; average adjacent vals */
idx = loc(B <= A[i]);
/* 3 cases: EMPTY, the last index, or valid index that is not last */
if ncol(idx)=0 then /* A[i] < all elements of B */
Result[i] = .; /* or you could use Other[1] instead */
else do;
if all(idx < nB) then do; /* average the Other elements */
j = idx[ ncol(idx) ];
Result[i] = (Other[j] + Other[j+1]) / 2;
end;
else
Result[i] = Other[nB]; /* or missing? */
end;
end;
end;
print A Result;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.