Statistical programming, matrix languages, and more

Find 1st non-missing element of each column without looping

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

Find 1st non-missing element of each column without looping

/*

I have an array of stock price data in columns, 1 column per stock.

Some of the stocks have prices starting on different rows creating missing values for some of the beginning values of a column. Once pricing "starts", no subsequent prices are missing.

For example, in array b, below, data for column 2 starts in row 2 and for column 3 it starts in row 5.

I'd like to find the indices of the first non-missing element of each column.

In the simple code below, I do that looping one column at a time to get the answer {1 2 5}

Can it be done without looping over ncol?

*/

proc iml;

b = j(10,3,.);

b[    ,1] = colvec(1:10);

b[2:10,2] = colvec(2:10) + 0.1;

b[5:10,3] = colvec(5:10) + 0.5;

idx = j(1,3,.);

do i = 1 to ncol(b);

  idx = (loc(b[,i]^=.))[1];

end;

print b idx;

quit;run;

endsas;

        B                           IDX

        1         .         .         1         2         5

        2       2.1         .                             

        3       3.1         .                             

        4       4.1         .                             

        5       5.1       5.5                             

        6       6.1       6.5                             

        7       7.1       7.5                             

        8       8.1       8.5                             

        9       9.1       9.5                             

       10      10.1      10.5                             

Attachment

Accepted Solutions
Solution
‎05-08-2015 01:47 PM
SAS Super FREQ
Posts: 3,234

Re: Find 1st non-missing element of each column without looping

You say that "Once pricing 'starts', no subsequent prices are missing."  Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:

idx = 1 + missing(b)[+,];

If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible.  Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:

proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b = .;

idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
  k = (loc(b[,i]^=.));
  if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;

View solution in original post


All Replies
Solution
‎05-08-2015 01:47 PM
SAS Super FREQ
Posts: 3,234

Re: Find 1st non-missing element of each column without looping

You say that "Once pricing 'starts', no subsequent prices are missing."  Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:

idx = 1 + missing(b)[+,];

If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible.  Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:

proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b = .;

idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
  k = (loc(b[,i]^=.));
  if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;

Occasional Contributor
Posts: 14

Re: Find 1st non-missing element of each column without looping

"idx = 1 + missing(b)[+,];"  works great, thank you

I will have to be sure of my assumption of no missing after first price.

Grand Advisor
Posts: 9,335

Re: Find 1st non-missing element of each column without looping

Rick ,

You can get it without looping.

proc iml;
b = j(20, 5, 1);
call ranuni(1234,b);
b[loc(b < .5)]=.;

dim=nrow(b)||ncol(b);
c=ndx2sub(dim,loc(missing(b)=0));
create x from c[c={row col}];
append from c;

summary class {col} var {row}  stat {min} opt {noprint save};
first=t(row);
print b , first;
quit;

Xia Keshan

Message was edited by: xia keshan

SAS Super FREQ
Posts: 3,234

Re: Find 1st non-missing element of each column without looping

Clever. I like it!

Building on this idea, you can sort the C array and use the UNIQUEBY function to achieve the same result without writing the data set:

call sort(c, {2 1});

u = uniqueby(c[,2], 1);

first = c[u,1];

print first;

The UNIQUEBY technique is explained in  the article "An efficient alternative to the UNIQUE-LOC technique."

Grand Advisor
Posts: 9,335

Re: Find 1st non-missing element of each column without looping

Rick,

But I like your original code better . It is very fast .

Xia Keshan

Post a Question
Discussion Stats
  • 5 replies
  • 446 views
  • 3 likes
  • 3 in conversation