/*
I have an array of stock price data in columns, 1 column per stock.
Some of the stocks have prices starting on different rows creating missing values for some of the beginning values of a column. Once pricing "starts", no subsequent prices are missing.
For example, in array b, below, data for column 2 starts in row 2 and for column 3 it starts in row 5.
I'd like to find the indices of the first non-missing element of each column.
In the simple code below, I do that looping one column at a time to get the answer {1 2 5}
Can it be done without looping over ncol?
*/
proc iml;
b = j(10,3,.);
b[ ,1] = colvec(1:10);
b[2:10,2] = colvec(2:10) + 0.1;
b[5:10,3] = colvec(5:10) + 0.5;
idx = j(1,3,.);
do i = 1 to ncol(b);
idx = (loc(b[,i]^=.))[1];
end;
print b idx;
quit;run;
endsas;
B IDX
1 . . 1 2 5
2 2.1 .
3 3.1 .
4 4.1 .
5 5.1 5.5
6 6.1 6.5
7 7.1 7.5
8 8.1 8.5
9 9.1 9.5
10 10.1 10.5
You say that "Once pricing 'starts', no subsequent prices are missing." Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:
idx = 1 + missing(b)[+,];
If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible. Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:
proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b
idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
k = (loc(b[,i]^=.));
if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;
You say that "Once pricing 'starts', no subsequent prices are missing." Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:
idx = 1 + missing(b)[+,];
If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible. Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:
proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b
idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
k = (loc(b[,i]^=.));
if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;
"idx = 1 + missing(b)[+,];" works great, thank you
I will have to be sure of my assumption of no missing after first price.
Rick ,
You can get it without looping.
proc iml; b = j(20, 5, 1); call ranuni(1234,b); b[loc(b < .5)]=.; dim=nrow(b)||ncol(b); c=ndx2sub(dim,loc(missing(b)=0)); create x from c[c={row col}]; append from c; summary class {col} var {row} stat {min} opt {noprint save}; first=t(row); print b , first; quit;
Xia Keshan
Message was edited by: xia keshan
Clever. I like it!
Building on this idea, you can sort the C array and use the UNIQUEBY function to achieve the same result without writing the data set:
call sort(c, {2 1});
u = uniqueby(c[,2], 1);
first = c[u,1];
print first;
The UNIQUEBY technique is explained in the article "An efficient alternative to the UNIQUE-LOC technique."
Rick,
But I like your original code better . It is very fast .
Xia Keshan
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.