Hello all, I am trying to estimate parameters through a custom-written likelihood function in PROC IML. The input data to the function ("use" in the code below) is akin to a panel with multiple observations for each unit (vector 'u' in the code below stores all the unique units). I compute the likelihood of observing the data for each unit (I call it LL in the code below). Then, I compute the log(LL) and sum log(LL) over all units. The summation is my objective function. The code works fine for small datasets. For 1000 rows and 42 unique units in the dataset, it takes about a minute to execute. However, I have about 2 MN rows and 150K unique units in the data and each time I try executing, SAS gets hung. I even tried letting it run for 36 hours with no output. My guess is that the DO loop over all unique units is causing the time. My questions are as follows - 1. Is my guess correct? 2. If yes, do you have some advice on doing these computations without a DO Loop? 3. If no, do you have some advice on suggestions to increase the speed? As always, very grateful for all your support. Thank you, Rakesh %let extended_pos = 5;
proc iml;
/************************/
/*Read relevant variables from the data into a matrix X*/
varNames = {"seq" "position" "pred_usage" "residual_usage" "credits_r" "p_hat" "c_seller_fe" "c_pos_coef" "sl_fixed_effect" "pos_fixed_effect"};
use use;
read all var varNames into x;
close use ;
/************************/
start LogLik(param) global(x);
u = unique(x[,1]); /*Extract unique session Ids in vector u*/
s = j(1, ncol(u),.); /*Create an empty matrix to store probability of each session*/
do i = 1 to ncol(u);
idx = loc(x[,1]=u[i]); /*Get all the variables in each session*/
max_pos = max(x[,2][idx]`);
sess_pred_usage = x[,3][idx]`;
sess_res_usage = x[,4][idx]`;
sess_credits = max(x[,5][idx]`);
sess_phat = max(x[,6][idx]`);
sl_fe_choice = max(x[,7][idx]`);
sl_pos_choice = max(x[,8][idx]`);
sl_fe_scroll = max(x[,9][idx]`);
sl_pos_scroll = max(x[,10][idx]`);
max_pred_usage = sess_pred_usage[1,max_pos];
max_res_usage = sess_res_usage[1,max_pos];
pos_cut = max_pos + &extended_pos.;
pos = do(1, pos_cut, 1);
/************************/
/*Create the predicted and residual upto vectors for all positions upto the cut off*/
sess_pred_usage = sess_pred_usage||(max_pred_usage*j(1, &extended_pos.,1));
sess_res_usage = sess_res_usage||(max_res_usage*j(1, &extended_pos.,1));
/************************/
/************************/
/*Create a vector for probability of scrolling further at all positions
The seller always scrolls further from the first two positions*/
sc = logistic(sl_fe_scroll + sl_pos_scroll*pos + param[1] + param[2]*pos + param[3]*sess_pred_usage + param[4]*sess_res_usage + param[5]*sess_credits);/*param[2]*predicted_usage_prop + param[3]*residual_usage_prop +*/
sc[1,1] = 1; sc[1,2] = 1;
/************************/
/************************/
/*Create a vector for probability of ending at all positions greater then the maximum position
Up to the maximum position, the probability of ending is 0*/
ed = 1 - sc; ed[1, 1:max_pos-1] = 0;
/************************/
/************************/
/*Create a vector for computing the not choosing probality at each position.
Upto max_pos default this value to 1 as it does not affect the LL function. */
m1 = 1 - logistic(sl_fe_choice + sess_phat + sl_pos_choice*(max_pos + do(1, &extended_pos.,1)));
not_choosing_prob = j(1, max_pos,1)||m1;
/************************/
/************************/
/*Compute the probability of a sample path for all possible end points in a session. Store each probability in a vector called LL*/
LL = j(1, pos_cut,.);
LL[1,1] = 0; LL[1,2] = 0;
do m = 3 to pos_cut;
LL[1,m] = ed[1,m]*prod(sc[1,1:m-1])*prod(not_choosing_prob[1,1:m]);
end;
/************************/
s[i] = log(100*sum(LL)); /*Probability of aN OBSERVED session is the sum of all sample paths */
end;
return sum(s);
finish;
/************************/
/*Maximize sum(s)*/
param = {0 -0.0025 0.001 0.001 0};
optn = {1, /* 3. find max of function, and */ 4}; /* print moderate amount of output */
con = {. . . . .,
. . . . .};
call nlpnra(rc, xres, "LogLik", param, optn, con);
/************************/
quit;
... View more