Statistical programming, matrix languages, and more

Problem of Invalid subscript

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Problem of Invalid subscript

 the error shows as below:

ERROR: (execution) Invalid subscript or subscript out of range.

operation : [ at line 186 column 6
operands : T, t, *LIT1006

 

but actually, T is a 120×1 matrix, even I adjust "do t=1 to 12o" to "do t=1 to 2", I do not understand why.

and the code:

 

proc iml;
load N;
load T;
load NT;
B_0=N*(T`);
use ba.t3;
do i=1 to 2320;
  do t=1 to 5;
  f=N[i,1];
  g=T[t,1];
  B_0[i,t]=0;
    do c=1 to 248064;
    if NT[c,1]=f & NT[c,2]=g then do;
      read all where(SecCode=f & TDate=g) into S;
      XX=0;
      XY=0;
        do ini=1 to 15;
          XX=XX+S[ini,4]*S[ini,4];
	      XY=XY+S[ini+1,4]*S[ini,4];
        end;
      b_0=XY/XX;
      B_0[i,t]=b_0;
	end;
  end;
  end;
end;
store B_0;
quit;

Accepted Solutions
Solution
‎08-18-2017 11:37 PM
SAS Super FREQ
Posts: 3,559

Re: Problem of Invalid subscript

SAS is not case sensitive. 'T' and 't' represent the same variable name. Thus you load a matrix named T, but you reassign that matrix when you use "do t = 1 to 5".   Change the iterator from 't' to 'k' andupdate all the subscripts that are currently using 't'.

View solution in original post


All Replies
Solution
‎08-18-2017 11:37 PM
SAS Super FREQ
Posts: 3,559

Re: Problem of Invalid subscript

SAS is not case sensitive. 'T' and 't' represent the same variable name. Thus you load a matrix named T, but you reassign that matrix when you use "do t = 1 to 5".   Change the iterator from 't' to 'k' andupdate all the subscripts that are currently using 't'.

Occasional Contributor
Posts: 10

Re: Problem of Invalid subscript

Thank you for your help. Problem of the subscript solved. But new problem appears: ERROR: Not enough memory to store all matrices.

 

It seems that I do not have too many matrices in my code. The matrix S is always replaced.

SAS Super FREQ
Posts: 3,559

Re: Problem of Invalid subscript

I can't tell from what you've shown.

 

I would advise that you remove the hard-coded values for the loops and replace them with terms such as nrow(N), nrow(T), and nrow(NT), which will make your code easier to read and more portable. 

 

The statements

     b_0=XY/XX;
      B_0[i,t]=b_0;

are also wrong b/c you are overwriting b_0, which is the same as B_0.

Frequent Contributor
Posts: 141

Re: Problem of Invalid subscript

Are the matrices B_0 and S actually a lot larger than you think they are?   Check by adding the command

show names;

at strategic points to obtain a list of matrix names and sizes.

 

Occasional Contributor
Posts: 10

Re: Problem of Invalid subscript

I don't quite understand what hard-coded values are, would you mind to explain more details on it and how to deal with it by steps? Thanks a lot, I do need your help!

Frequent Contributor
Posts: 141

Re: Problem of Invalid subscript

Rick is suggesting that if you write the loop like this:

do c = 1 to nrow(NT);

 

then there is no need to check the size of the matrix NT each time the program is run.  Presumably, when you wrote the code above, you had to check that NT had 248064 rows and then hardcode this value in the DO statement.

 

I am worried about the potential number of iterations for the inner most loop. Depending on the data and how often the IF statement is true, then you may be trying to execute the read statement millions, may be billions of times.  This sounds inefficient and it may be better to read the contents of ba.t3 into a matrix at the beginning, and then refer to it rather than the SAS data set.

Occasional Contributor
Posts: 10

Re: Problem of Invalid subscript

Yes! The number of iteration is up to 2320(stocks)×120(days), totally, 248064(some are missing). And in each iteration, I need to read data of each stock in each day from ba.t3 to a matrix S. It is terribly inefficient, that one iteration cost me around 30 seconds and all the iterations will cost me 100 days, but I do not know how to improve it. 

SAS Super FREQ
Posts: 3,559

Re: Problem of Invalid subscript

Since we are offering advice now, I want to point out that the loop to compute XX and XY can be replaced by more efficient statements that use vector multiplication:

 

/* to test, create any matrix S that has at least 16 rows and at least 4 coluumns */

/* replace current code that uses a loop... */ XX=0; XY=0; do ini=1 to 15; XX=XX+S[ini,4]*S[ini,4]; XY=XY+S[ini+1,4]*S[ini,4]; end; print XX XY; /* ... with more efficient code that uses a vector inner product */ X = S[1:15,4]; Y = S[2:16,4]; XX = X` * X; XY = X` * Y; print XX XY;

Other loops can probably be similarly improved.

SAS Super FREQ
Posts: 3,559

Re: Problem of Invalid subscript

Yes! The number of iteration is up to 2320(stocks)×120(days), totally, 248064(some are missing). And in each iteration, I need to read data of each stock in each day from ba.t3 to a matrix S. It is terribly inefficient, that one iteration cost me around 30 seconds and all the iterations will cost me 100 days, but I do not know how to improve it. 

The way to improve it is to post sample data (maybe 3 stocks x 4 days) including the N, T, and NT matrices.

Also, describe in words what you want the program to do. Then other programmers can make recommendations for improvements.  

 

Occasional Contributor
Posts: 10

Re: Problem of Invalid subscript

Thank you for your suggestions.

 

table below is a sample of my data for one stock in one day(3×3 or more are too big to list),  matrix N is the list of distinct SecCodes, T is the list of distinct TDates, and NT is the list of all the combinations of stock code and date.

 

The goal of my programming is to calculate an evolutionary AR(1) for every stock in every day. The data I got has 59782508 observations, which cover half year(120 days) and 2320 stocks. So I need to read the matrices from the data for 2320*120 times, which ask for a long time to run. The efficiency of the code is a big problem for me.

 

proc iml;
load N;
load D;
load NT;
load B;
use ba.t3;
do i=2320 to 2320;
  do t=120 to 120;
  f=N[i,1];
  g=D[t,1];
  B[i,t]=0;
    do c=1 to 248064;
    if NT[c,1]=f & NT[c,2]=g then do;
      read all where(SecCode=f & TDate=g) into S;
      XX=0;
      XY=0;
        do ini=1 to 15;
          XX=XX+S[ini,4]*S[ini,4];
	      XY=XY+S[ini+1,4]*S[ini,4];
        end;
      b_0=XY/XX;
      B[i,t]=b_0;
	end;
  end;
  end;
end;
store B;
quit;

 

 

                         Obs   SecCode     TDate   MinTime StartPrc HighPrc LowPrc EndPrc    MinTq      MinTm 

12015100893010.8510.8510.8510.85216048623441273.1
12015100893110.8610.8710.8610.86349547337987109.2
12015100893210.8610.8610.810.898764110693639.26
12015100893310.810.810.7710.788894009598748
12015100893410.7810.810.7810.8121873513125513.25
12015100893510.810.8210.7910.796448006964091.82
12015100893610.7910.8210.7810.84182784519166.4
12015100893710.810.8110.810.85504225949659.6
12015100893810.810.810.7810.794036004354584.11
12015100893910.7810.7910.7710.773614003894113
12015100894010.7610.7710.7510.755783006221391.16
12015100894110.7510.7910.7510.797553828129020.25
12015100894210.810.8110.7610.778626839301424.35
12015100894310.8110.8210.7810.824655005031375.8
12015100894410.7810.8110.7810.791196001291438
12015100894510.7810.7910.7710.771428001539261
12015100894610.7710.810.7710.794768005140467
12015100894710.7910.7910.7810.791853001998904
12015100894810.7910.7910.7810.793662593949433.02
12015100894910.7910.810.7810.793348123612752.6

 

 
SAS Super FREQ
Posts: 3,559

Re: Problem of Invalid subscript

Well, I can't make sense of what you are telling me and there is no way to run your program, so I give up.

 

Did you try using an ETS procedure? I am guessing that the best way to do this is to sort by SECCODE and TDATE and then use something like PROC AUTOREG and BY SECCODE TDATE to compute the regression coefficients. You can use the OUTEST= option to output the regression coefficients. Be sure to use ODS EXCLUDE ALL to suppress all the tables/graphs that you don't want.  Your code might look somthing like this (untested)

 

ods exclude all;

proc autoreg data=mydata plots=none outest=ParamEst;

   by SecCode TDate;

   model StartPrc = / nlags=1;   /* ??? */

run;quit;

ods exclude none;

 

Occasional Contributor
Posts: 10

Re: Problem of Invalid subscript

I can try to make it clear: I want split my data by SecCode and TDate, then calculate in the splitted data.

 

The data has been sorted, but still low efficient while i and t are large.

 

I appreciate your help very much.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 399 views
  • 6 likes
  • 3 in conversation