I have a few suggestions that should help you improve and optimize your code.
I am concerned that the matrix Njxx, which you say should hold the final results, is being set to a zero matrix within the third nested loop. By continually overwriting Njxx, are you not throwing away most of the results, and only collecting information from the very last iteration of the inner most loop, when age_x, age_j and t are all equal to &MaxAge?
The ELSE block near the end of the code is attempting to 'grow' the matrix Njxx by appending a row at the bottom for every iteration. This is very inefficient as it creates a new matrix in a different area of memory every time. Because of the unnecessary copying syntax like this should be avoided. Ideally you want to create Njxx at its final size at the start of the program (full of missing values) and then write each row of results using something like:
Njxx[ i, 1:ncol(DiagOffsetSum) ] = DiagOffsetSum;
where 'i' is the next available row to hold results.
I think the calculation in the innermost loop can be improved as there is no need to create the maxtrix DiagOffset with all the different lags. Something simpler based on the CUSUM function like
DiagOffsetSum = sum( Djxx_diag ) - (0 || t( cusum( Djxx_diag )));
will suffice and give the sums that you want. Note that there will be a redundant zero at the end of the vector DiagOffsetSum, but it is easy to leave this out when you copy the results to Njxx.
... View more