Greetings!
I have a n x p matrix of probabilities called Z.
I'd like to identify the first column within each row of Z where the element < 0.1.
These column subscripts will be output to a n x 1 vector called R composed of the column number from Z meeting the criterion within each row.
If the criterion is not met within a row, then insert a 0 for that row.
example:
Z = {0.2 0.3 0.05,
0.01 0.01 0.01,
0.2 0.3 0.5}
R={3,1,0}
I can think of do-loop approaches, but Z is potentially very large (millions) and I would like a vectorized solution. The origin of this problem is Bayesian sample size analysis for sequential updating, as in http://www.fharrell.com/post/bayes-seq/
Thanks in advance!
Garnett
There's probably a more efficient solution but this resets the rows of R to 0 where no elements < 0.1
proc iml;
Z = {0.2 0.3 0.05,
0.01 0.01 0.01,
0.2 0.3 0.5,
0.2 0.3 0.5}; /* added extra row with no elements < 0.1 */
G = (Z < 0.1); /* binary indicator matrix */
zrows = loc(G[ , +] = 0); /* vector of rows in G with no elements < 0.1 */
R = G[ , <:>]; /* index of max value in each row */
R[zrows,] = 0; /* reset R to 0 where no elements < 0.1 */
print R;
quit;
Interesting question. You can form the binary indicator matrix for the condition you want to detect, then use the row maximum operator to return the first '1' in each row.
proc iml;
Z = {0.2 0.3 0.05,
0.01 0.01 0.01,
0.2 0.3 0.5};
G = (Z < 0.1); /* binary indicator matrix */
R = G[ , <:>]; /* index of max value in each row */
This should easily handle millions of rows.
Thanks, Rick!
It looks like the default behavior of <:> where there are 'ties' across columns is to index the first column in which the value appears.
Is that correct? If so, it's just what I need.
Also, the approach you give identifies a value of 1 for the row where none of the columns meet criterion. I can work with this, but it will be difficult to distinguish the rows where the first column is truly the first appearance of the criterion, as opposed to rows where none of the columns meet criterion.
Thanks again!
Sorry about the emoji, I meant to write < : >
There's probably a more efficient solution but this resets the rows of R to 0 where no elements < 0.1
proc iml;
Z = {0.2 0.3 0.05,
0.01 0.01 0.01,
0.2 0.3 0.5,
0.2 0.3 0.5}; /* added extra row with no elements < 0.1 */
G = (Z < 0.1); /* binary indicator matrix */
zrows = loc(G[ , +] = 0); /* vector of rows in G with no elements < 0.1 */
R = G[ , <:>]; /* index of max value in each row */
R[zrows,] = 0; /* reset R to 0 where no elements < 0.1 */
print R;
quit;
That's it!
I'm more or less a novice at IML, and really need a better understanding of subscript reduction operators.
Thanks!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.