BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
supmilk
Obsidian | Level 7

 the error shows as below:

ERROR: (execution) Invalid subscript or subscript out of range.

operation : [ at line 186 column 6
operands : T, t, *LIT1006

 

but actually, T is a 120×1 matrix, even I adjust "do t=1 to 12o" to "do t=1 to 2", I do not understand why.

and the code:

 

proc iml;
load N;
load T;
load NT;
B_0=N*(T`);
use ba.t3;
do i=1 to 2320;
  do t=1 to 5;
  f=N[i,1];
  g=T[t,1];
  B_0[i,t]=0;
    do c=1 to 248064;
    if NT[c,1]=f & NT[c,2]=g then do;
      read all where(SecCode=f & TDate=g) into S;
      XX=0;
      XY=0;
        do ini=1 to 15;
          XX=XX+S[ini,4]*S[ini,4];
	      XY=XY+S[ini+1,4]*S[ini,4];
        end;
      b_0=XY/XX;
      B_0[i,t]=b_0;
	end;
  end;
  end;
end;
store B_0;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

SAS is not case sensitive. 'T' and 't' represent the same variable name. Thus you load a matrix named T, but you reassign that matrix when you use "do t = 1 to 5".   Change the iterator from 't' to 'k' andupdate all the subscripts that are currently using 't'.

View solution in original post

12 REPLIES 12
Rick_SAS
SAS Super FREQ

SAS is not case sensitive. 'T' and 't' represent the same variable name. Thus you load a matrix named T, but you reassign that matrix when you use "do t = 1 to 5".   Change the iterator from 't' to 'k' andupdate all the subscripts that are currently using 't'.

supmilk
Obsidian | Level 7

Thank you for your help. Problem of the subscript solved. But new problem appears: ERROR: Not enough memory to store all matrices.

 

It seems that I do not have too many matrices in my code. The matrix S is always replaced.

Rick_SAS
SAS Super FREQ

I can't tell from what you've shown.

 

I would advise that you remove the hard-coded values for the loops and replace them with terms such as nrow(N), nrow(T), and nrow(NT), which will make your code easier to read and more portable. 

 

The statements

     b_0=XY/XX;
      B_0[i,t]=b_0;

are also wrong b/c you are overwriting b_0, which is the same as B_0.

IanWakeling
Barite | Level 11

Are the matrices B_0 and S actually a lot larger than you think they are?   Check by adding the command

show names;

at strategic points to obtain a list of matrix names and sizes.

 

supmilk
Obsidian | Level 7

I don't quite understand what hard-coded values are, would you mind to explain more details on it and how to deal with it by steps? Thanks a lot, I do need your help!

IanWakeling
Barite | Level 11

Rick is suggesting that if you write the loop like this:

do c = 1 to nrow(NT);

 

then there is no need to check the size of the matrix NT each time the program is run.  Presumably, when you wrote the code above, you had to check that NT had 248064 rows and then hardcode this value in the DO statement.

 

I am worried about the potential number of iterations for the inner most loop. Depending on the data and how often the IF statement is true, then you may be trying to execute the read statement millions, may be billions of times.  This sounds inefficient and it may be better to read the contents of ba.t3 into a matrix at the beginning, and then refer to it rather than the SAS data set.

supmilk
Obsidian | Level 7

Yes! The number of iteration is up to 2320(stocks)×120(days), totally, 248064(some are missing). And in each iteration, I need to read data of each stock in each day from ba.t3 to a matrix S. It is terribly inefficient, that one iteration cost me around 30 seconds and all the iterations will cost me 100 days, but I do not know how to improve it. 

Rick_SAS
SAS Super FREQ

Since we are offering advice now, I want to point out that the loop to compute XX and XY can be replaced by more efficient statements that use vector multiplication:

 

/* to test, create any matrix S that has at least 16 rows and at least 4 coluumns */

/* replace current code that uses a loop... */ XX=0; XY=0; do ini=1 to 15; XX=XX+S[ini,4]*S[ini,4]; XY=XY+S[ini+1,4]*S[ini,4]; end; print XX XY; /* ... with more efficient code that uses a vector inner product */ X = S[1:15,4]; Y = S[2:16,4]; XX = X` * X; XY = X` * Y; print XX XY;

Other loops can probably be similarly improved.

Rick_SAS
SAS Super FREQ

Yes! The number of iteration is up to 2320(stocks)×120(days), totally, 248064(some are missing). And in each iteration, I need to read data of each stock in each day from ba.t3 to a matrix S. It is terribly inefficient, that one iteration cost me around 30 seconds and all the iterations will cost me 100 days, but I do not know how to improve it. 

The way to improve it is to post sample data (maybe 3 stocks x 4 days) including the N, T, and NT matrices.

Also, describe in words what you want the program to do. Then other programmers can make recommendations for improvements.  

 

supmilk
Obsidian | Level 7

Thank you for your suggestions.

 

table below is a sample of my data for one stock in one day(3×3 or more are too big to list),  matrix N is the list of distinct SecCodes, T is the list of distinct TDates, and NT is the list of all the combinations of stock code and date.

 

The goal of my programming is to calculate an evolutionary AR(1) for every stock in every day. The data I got has 59782508 observations, which cover half year(120 days) and 2320 stocks. So I need to read the matrices from the data for 2320*120 times, which ask for a long time to run. The efficiency of the code is a big problem for me.

 

proc iml;
load N;
load D;
load NT;
load B;
use ba.t3;
do i=2320 to 2320;
  do t=120 to 120;
  f=N[i,1];
  g=D[t,1];
  B[i,t]=0;
    do c=1 to 248064;
    if NT[c,1]=f & NT[c,2]=g then do;
      read all where(SecCode=f & TDate=g) into S;
      XX=0;
      XY=0;
        do ini=1 to 15;
          XX=XX+S[ini,4]*S[ini,4];
	      XY=XY+S[ini+1,4]*S[ini,4];
        end;
      b_0=XY/XX;
      B[i,t]=b_0;
	end;
  end;
  end;
end;
store B;
quit;

 

 

                         Obs   SecCode     TDate   MinTime StartPrc HighPrc LowPrc EndPrc    MinTq      MinTm 

12015100893010.8510.8510.8510.85216048623441273.1
12015100893110.8610.8710.8610.86349547337987109.2
12015100893210.8610.8610.810.898764110693639.26
12015100893310.810.810.7710.788894009598748
12015100893410.7810.810.7810.8121873513125513.25
12015100893510.810.8210.7910.796448006964091.82
12015100893610.7910.8210.7810.84182784519166.4
12015100893710.810.8110.810.85504225949659.6
12015100893810.810.810.7810.794036004354584.11
12015100893910.7810.7910.7710.773614003894113
12015100894010.7610.7710.7510.755783006221391.16
12015100894110.7510.7910.7510.797553828129020.25
12015100894210.810.8110.7610.778626839301424.35
12015100894310.8110.8210.7810.824655005031375.8
12015100894410.7810.8110.7810.791196001291438
12015100894510.7810.7910.7710.771428001539261
12015100894610.7710.810.7710.794768005140467
12015100894710.7910.7910.7810.791853001998904
12015100894810.7910.7910.7810.793662593949433.02
12015100894910.7910.810.7810.793348123612752.6

 

 
Rick_SAS
SAS Super FREQ

Well, I can't make sense of what you are telling me and there is no way to run your program, so I give up.

 

Did you try using an ETS procedure? I am guessing that the best way to do this is to sort by SECCODE and TDATE and then use something like PROC AUTOREG and BY SECCODE TDATE to compute the regression coefficients. You can use the OUTEST= option to output the regression coefficients. Be sure to use ODS EXCLUDE ALL to suppress all the tables/graphs that you don't want.  Your code might look somthing like this (untested)

 

ods exclude all;

proc autoreg data=mydata plots=none outest=ParamEst;

   by SecCode TDate;

   model StartPrc = / nlags=1;   /* ??? */

run;quit;

ods exclude none;

 

supmilk
Obsidian | Level 7

I can try to make it clear: I want split my data by SecCode and TDate, then calculate in the splitted data.

 

The data has been sorted, but still low efficient while i and t are large.

 

I appreciate your help very much.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 12 replies
  • 2977 views
  • 6 likes
  • 3 in conversation