/*
I have an array of stock price data in columns, 1 column per stock.
Some of the stocks have prices starting on different rows creating missing values for some of the beginning values of a column. Once pricing "starts", no subsequent prices are missing.
For example, in array b, below, data for column 2 starts in row 2 and for column 3 it starts in row 5.
I'd like to find the indices of the first non-missing element of each column.
In the simple code below, I do that looping one column at a time to get the answer {1 2 5}
Can it be done without looping over ncol?
*/
proc iml;
b = j(10,3,.);
b[ ,1] = colvec(1:10);
b[2:10,2] = colvec(2:10) + 0.1;
b[5:10,3] = colvec(5:10) + 0.5;
idx = j(1,3,.);
do i = 1 to ncol(b);
idx = (loc(b[,i]^=.))[1];
end;
print b idx;
quit;run;
endsas;
B IDX
1 . . 1 2 5
2 2.1 .
3 3.1 .
4 4.1 .
5 5.1 5.5
6 6.1 6.5
7 7.1 7.5
8 8.1 8.5
9 9.1 9.5
10 10.1 10.5
You say that "Once pricing 'starts', no subsequent prices are missing." Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:
idx = 1 + missing(b)[+,];
If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible. Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:
proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b
idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
k = (loc(b[,i]^=.));
if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;
You say that "Once pricing 'starts', no subsequent prices are missing." Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:
idx = 1 + missing(b)[+,];
If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible. Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:
proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b
idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
k = (loc(b[,i]^=.));
if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;
"idx = 1 + missing(b)[+,];" works great, thank you
I will have to be sure of my assumption of no missing after first price.
Rick ,
You can get it without looping.
proc iml; b = j(20, 5, 1); call ranuni(1234,b); b[loc(b < .5)]=.; dim=nrow(b)||ncol(b); c=ndx2sub(dim,loc(missing(b)=0)); create x from c[c={row col}]; append from c; summary class {col} var {row} stat {min} opt {noprint save}; first=t(row); print b , first; quit;
Xia Keshan
Message was edited by: xia keshan
Clever. I like it!
Building on this idea, you can sort the C array and use the UNIQUEBY function to achieve the same result without writing the data set:
call sort(c, {2 1});
u = uniqueby(c[,2], 1);
first = c[u,1];
print first;
The UNIQUEBY technique is explained in the article "An efficient alternative to the UNIQUE-LOC technique."
Rick,
But I like your original code better . It is very fast .
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.