I assume X is sorted by the first column?
1. Unless you have some very extreme outliers, the MLE estimates for 4 million observations are going to be very similar to estimates for a smaller subset of the data. If possible, use a smaller subset (like 8K-10K) to obtain preliminary estimates, then use those estimates as an initial guess for the MLE applied to the whole data set.
2. I think your logic is wrong in the line
if LL < 1/(2**500) then LL = 1/(2**500);
LL is a vector, which means that the statement is interpreted as if ALL(LL) < 2##(-500) then... See IF-THEN logic with matrix expressions - The DO Loop (sas.com)
3. The following parameters only need to be computed once prior to calling the optimizer. They do not need to be computed during each call to the log-likelihood (LL) function.
sess_credits = max(x[idx,5]`);
sl_fe_scroll = max(x[idx,6]`);
sl_pos_scroll = max(x[idx,7]`);
max_pos = max(max(x[idx,8]`),3);
wk = max(x[idx,9]`);
4. I don't think you need to subset the data every time you call the LL. For example, if you are analyzing 3 groups, you could subset X into x1, x2, x3 before calling the optimizer. You would use x1, x2, and x3 on the GLOBAL clause instead of X. If you are analyzing many groups (or you don't know how many groups you will be analyzing), you can build a list of subsets L such that L$i is the i_th subset.
I suspect that precomputing the subsets of x will improve the performance.
... View more