BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
stevo642
Obsidian | Level 7

/*

I have an array of stock price data in columns, 1 column per stock.

Some of the stocks have prices starting on different rows creating missing values for some of the beginning values of a column. Once pricing "starts", no subsequent prices are missing.

For example, in array b, below, data for column 2 starts in row 2 and for column 3 it starts in row 5.

I'd like to find the indices of the first non-missing element of each column.

In the simple code below, I do that looping one column at a time to get the answer {1 2 5}

Can it be done without looping over ncol?

*/

proc iml;

b = j(10,3,.);

b[    ,1] = colvec(1:10);

b[2:10,2] = colvec(2:10) + 0.1;

b[5:10,3] = colvec(5:10) + 0.5;

idx = j(1,3,.);

do i = 1 to ncol(b);

  idx = (loc(b[,i]^=.))[1];

end;

print b idx;

quit;run;

endsas;

        B                           IDX

        1         .         .         1         2         5

        2       2.1         .                             

        3       3.1         .                             

        4       4.1         .                             

        5       5.1       5.5                             

        6       6.1       6.5                             

        7       7.1       7.5                             

        8       8.1       8.5                             

        9       9.1       9.5                             

       10      10.1      10.5                             

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

You say that "Once pricing 'starts', no subsequent prices are missing."  Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:

idx = 1 + missing(b)[+,];

If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible.  Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:

proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b = .;

idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
  k = (loc(b[,i]^=.));
  if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;

View solution in original post

5 REPLIES 5
Rick_SAS
SAS Super FREQ

You say that "Once pricing 'starts', no subsequent prices are missing."  Assuming that this is true, you can call the MISSING function to convert your matrix to a 0/1 matrix and then just count the number of 1s in each column:

idx = 1 + missing(b)[+,];

If your assumption is not correct, then this method fails and I can't think of a way to avoid the loop. However, loops in SAS/IML tend to be much faster than in other vector languages, so the performance of the loop is not terrible.  Your looping strategy requires only a few seconds even if you are looping over 100,000 columns, as the following program shows:

proc iml;
b = j(500, 100000, 1);
call randseed(12345);
N = nrow(b)*ncol(b);
j = sample(1:N, int(0.5*N));
b = .;

idx = j(1,ncol(b),.);
t0 = time();
do i = 1 to ncol(b);
  k = (loc(b[,i]^=.));
  if ncol(k)>1 then idx = k[1];
end;
ElapsedTime = time()-t0;
print ElapsedTime;

stevo642
Obsidian | Level 7

"idx = 1 + missing(b)[+,];"  works great, thank you

I will have to be sure of my assumption of no missing after first price.

Ksharp
Super User

Rick ,

You can get it without looping.

proc iml;
b = j(20, 5, 1);
call ranuni(1234,b);
b[loc(b < .5)]=.;

dim=nrow(b)||ncol(b);
c=ndx2sub(dim,loc(missing(b)=0));
create x from c[c={row col}];
append from c;

summary class {col} var {row}  stat {min} opt {noprint save};
first=t(row);
print b , first;
quit;

Xia Keshan

Message was edited by: xia keshan

Rick_SAS
SAS Super FREQ

Clever. I like it!

Building on this idea, you can sort the C array and use the UNIQUEBY function to achieve the same result without writing the data set:

call sort(c, {2 1});

u = uniqueby(c[,2], 1);

first = c[u,1];

print first;

The UNIQUEBY technique is explained in  the article "An efficient alternative to the UNIQUE-LOC technique."

Ksharp
Super User

Rick,

But I like your original code better . It is very fast .

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 5 replies
  • 1164 views
  • 3 likes
  • 3 in conversation