BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Garnett
Obsidian | Level 7

Greetings!

 

I have a  n x p matrix of probabilities called Z. 

 

I'd like to identify the first column within each row of Z where the element < 0.1.

 

These column subscripts will be output to a n x 1 vector called R composed of the column number from Z meeting the criterion within each row.

 

If the criterion is not met within a row, then insert a 0 for that row.

 

example:

Z = {0.2 0.3 0.05,

      0.01 0.01 0.01,

      0.2 0.3 0.5}

 

R={3,1,0}

 

I can think of do-loop approaches, but Z is potentially very large (millions) and I would like a vectorized solution. The origin of this problem is Bayesian sample size analysis for sequential updating, as in http://www.fharrell.com/post/bayes-seq/

 

Thanks in advance!

Garnett

1 ACCEPTED SOLUTION

Accepted Solutions
jnvickery
Obsidian | Level 7

There's probably a more efficient solution but this resets the rows of R to 0 where no elements < 0.1

 

proc iml;
Z = {0.2 0.3 0.05,
     0.01 0.01 0.01,
     0.2 0.3 0.5,
     0.2 0.3 0.5};         /* added extra row with no elements < 0.1 */

G = (Z < 0.1);             /* binary indicator matrix */

zrows = loc(G[ , +] = 0);  /* vector of rows in G with no elements < 0.1 */

R = G[ , <:>];             /* index of max value in each row */

R[zrows,] = 0;             /* reset R to 0 where no elements < 0.1 */
print R;
quit;

View solution in original post

5 REPLIES 5
Rick_SAS
SAS Super FREQ

Interesting question. You can form the binary indicator matrix for the condition you want to detect, then use the row maximum operator to return the first '1' in each row.

 

proc iml;
Z = {0.2 0.3 0.05,
     0.01 0.01 0.01,
     0.2 0.3 0.5};
G = (Z < 0.1); /* binary indicator matrix */
R = G[ , <:>]; /* index of max value in each row */

This should easily handle millions of rows.

Garnett
Obsidian | Level 7

Thanks, Rick!

It looks like the default behavior of <:> where there are 'ties' across columns is to index the first column in which the value appears.

Is that correct? If so, it's just what I need.

 

Also, the approach you give identifies a value of 1 for the row where none of the columns meet criterion. I can work with this, but it will be difficult to distinguish the rows where the first column is truly the first appearance of the criterion, as opposed to rows where none of the columns meet criterion.

 

Thanks again!

Garnett
Obsidian | Level 7

Sorry about the emoji, I meant to write < : >

jnvickery
Obsidian | Level 7

There's probably a more efficient solution but this resets the rows of R to 0 where no elements < 0.1

 

proc iml;
Z = {0.2 0.3 0.05,
     0.01 0.01 0.01,
     0.2 0.3 0.5,
     0.2 0.3 0.5};         /* added extra row with no elements < 0.1 */

G = (Z < 0.1);             /* binary indicator matrix */

zrows = loc(G[ , +] = 0);  /* vector of rows in G with no elements < 0.1 */

R = G[ , <:>];             /* index of max value in each row */

R[zrows,] = 0;             /* reset R to 0 where no elements < 0.1 */
print R;
quit;
Garnett
Obsidian | Level 7

That's it! 

I'm more or less a novice at IML, and really need a better understanding of subscript reduction operators.

 

Thanks!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 5 replies
  • 1506 views
  • 1 like
  • 3 in conversation