Solved: Find 1st non-missing element of each column without looping

stevo642 · Posted 05-08-2015 12:21 PM

/*

I have an array of stock price data in columns, 1 column per stock.

Some of the stocks have prices starting on different rows creating missing values for some of the beginning values of a column. Once pricing "starts", no subsequent prices are missing.

For example, in array b, below, data for column 2 starts in row 2 and for column 3 it starts in row 5.

I'd like to find the indices of the first non-missing element of each column.

In the simple code below, I do that looping one column at a time to get the answer {1 2 5}

Can it be done without looping over ncol?

*/

proc iml;

b = j(10,3,.);

b[ ,1] = colvec(1:10);

b[2:10,2] = colvec(2:10) + 0.1;

b[5:10,3] = colvec(5:10) + 0.5;

idx = j(1,3,.);

do i = 1 to ncol(b);

idx = (loc(b[,i]^=.))[1];

end;

print b idx;

quit;run;

endsas;

B IDX

1 . . 1 2 5

2 2.1 .

3 3.1 .

4 4.1 .

5 5.1 5.5

6 6.1 6.5

7 7.1 7.5

8 8.1 8.5

9 9.1 9.5

10 10.1 10.5

Rick_SAS · Posted 05-08-2015 01:47 PM

You say that "Once pricing 'starts', no subsequent prices are missing." Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:

idx = 1 + missing(b)[+,];

If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible. Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:

proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b = .;

idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
k = (loc(b[,i]^=.));
if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;

View solution in original post

Rick_SAS · Posted 05-08-2015 01:47 PM

You say that "Once pricing 'starts', no subsequent prices are missing." Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:

idx = 1 + missing(b)[+,];

If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible. Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:

proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b = .;

idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
k = (loc(b[,i]^=.));
if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;

stevo642 · Posted 05-08-2015 02:01 PM

"idx = 1 + missing(b)[+,];" works great, thank you

I will have to be sure of my assumption of no missing after first price.

Ksharp · Posted 05-09-2015 04:24 AM

Rick ,

You can get it without looping.

proc iml;
b = j(20, 5, 1);
call ranuni(1234,b);
b[loc(b < .5)]=.;

dim=nrow(b)||ncol(b);
c=ndx2sub(dim,loc(missing(b)=0));
create x from c[c={row col}];
append from c;

summary class {col} var {row}  stat {min} opt {noprint save};
first=t(row);
print b , first;
quit;

Xia Keshan

Message was edited by: xia keshan

Rick_SAS · Posted 05-11-2015 08:53 AM

Clever. I like it!

Building on this idea, you can sort the C array and use the UNIQUEBY function to achieve the same result without writing the data set:

call sort(c, {2 1});

u = uniqueby(c[,2], 1);

first = c[u,1];

print first;

The UNIQUEBY technique is explained in the article "An efficient alternative to the UNIQUE-LOC technique."

Ksharp · Posted 05-11-2015 09:13 AM

Rick,

But I like your original code better . It is very fast .

Xia Keshan

Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Re: Find 1st non-missing element of each column without looping

Ready to join fellow brilliant minds for the SAS Hackathon?