/*
I have an array of stock price data in columns, 1 column per stock.
Some of the stocks have prices starting on different rows creating missing values for some of the beginning values of a column. Once pricing "starts", no subsequent prices are missing.
For example, in array b, below, data for column 2 starts in row 2 and for column 3 it starts in row 5.
I'd like to find the indices of the first non-missing element of each column.
In the simple code below, I do that looping one column at a time to get the answer {1 2 5}
Can it be done without looping over ncol?
*/
proc iml;
b = j(10,3,.);
b[ ,1] = colvec(1:10);
b[2:10,2] = colvec(2:10) + 0.1;
b[5:10,3] = colvec(5:10) + 0.5;
idx = j(1,3,.);
do i = 1 to ncol(b);
idx = (loc(b[,i]^=.))[1];
end;
print b idx;
quit;run;
endsas;
B IDX
1 . . 1 2 5
2 2.1 .
3 3.1 .
4 4.1 .
5 5.1 5.5
6 6.1 6.5
7 7.1 7.5
8 8.1 8.5
9 9.1 9.5
10 10.1 10.5
You say that "Once pricing 'starts', no subsequent prices are missing." Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:
idx = 1 + missing(b)[+,];
If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible. Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:
proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b
idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
k = (loc(b[,i]^=.));
if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;
You say that "Once pricing 'starts', no subsequent prices are missing." Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:
idx = 1 + missing(b)[+,];
If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible. Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:
proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b
idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
k = (loc(b[,i]^=.));
if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;
"idx = 1 + missing(b)[+,];" works great, thank you
I will have to be sure of my assumption of no missing after first price.
Rick ,
You can get it without looping.
proc iml; b = j(20, 5, 1); call ranuni(1234,b); b[loc(b < .5)]=.; dim=nrow(b)||ncol(b); c=ndx2sub(dim,loc(missing(b)=0)); create x from c[c={row col}]; append from c; summary class {col} var {row} stat {min} opt {noprint save}; first=t(row); print b , first; quit;
Xia Keshan
Message was edited by: xia keshan
Clever. I like it!
Building on this idea, you can sort the C array and use the UNIQUEBY function to achieve the same result without writing the data set:
call sort(c, {2 1});
u = uniqueby(c[,2], 1);
first = c[u,1];
print first;
The UNIQUEBY technique is explained in the article "An efficient alternative to the UNIQUE-LOC technique."
Rick,
But I like your original code better . It is very fast .
Xia Keshan
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.