BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
rakeshallu
Obsidian | Level 7

Dear All, 

 

I am writing a custom objective function in the PROC IML environment where I want to estimate 56 parameters. Of these, I have 51 are week fixed effects and the rest 5 are parameters of interest. I want to conduct the estimation on a dataset with 4,037,641 rows. The code below works but takes very long; no output in over 16 hours. However, for a sub-sample of  23,809 rows, the code executes in  4:45.73 real-time and 1:38.68 CPU time. I would like your suggestions on ways to optimize this code.  

 

Thank you in advance. 

Best, 

Rakesh

proc iml;
/************************/
/*Read relevant variables from the data into a matrix X*/	
varNames = {"seq" "position" "usage_upto" "lag_mean_usage_after_p" "credits_left" "sl_fixed_effect" "pos_fixed_effect" "max_pos" "wk" "choice_p" "chosen" "week_fe_scroll"};
use scroll; 
	read all var varNames into x; 
close scroll; 
/************************/
b = uniqueby(x[,1],1)`; 														/*starting points in each session*/
u = unique(x[,1]);
b = b ||(nrow(x[,1])+1);     													/* trick: append (n+1) to end of b for taking care of the last session*/

param = {3.354 -0.097 -2.538 0.146 0.002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0}; 											/*set initial parameter values for models with CHOSEN VARIABLE*/

start LogLik(param) global(x,b);
s = j(1, ncol(b)-1,.);															/*Create an empty matrix to store probability of each session*/
do i = 1 to ncol(b)-1;
	idx = b[1,i]:(b[1,i+1]-1);
																				/*Extract unique session Ids in vector u*/ 
	pos = 							x[idx,2]`;
	sess_usage_upto	=				x[idx,3]`;
	lag_use =						x[idx,4]`;
	sess_credits = 					max(x[idx,5]`);
	sl_fe_scroll =  				max(x[idx,6]`);
	sl_pos_scroll =  				max(x[idx,7]`);
	max_pos = 						max(max(x[idx,8]`),3);
	wk = 							max(x[idx,9]`);
	choice_p = 						x[idx,10]`;
	chosen = 						x[idx,11]`;
	wk_fe_scroll = 					x[idx,12]`;

	/************************/
	/*Create a vector for transaction probability at all positions. Create the probability of observing the choice at each position (choice_mult)*/
	choice_mult = (choice_p##chosen) # ((1-choice_p)##(1-chosen)); 

	/************************/
	/*Create a vector for probability of scrolling further at all positions 
	The seller always scrolls further from the first two positions*/
	if wk = 52 then k = 0; else do; z = wk + 5; k = param[z]; end;
	sc = logistic(param[1] + param[2]*pos + param[3]*(sess_usage_upto/pos) + param[4]*lag_use + param[5]*sess_credits + k);
	sc[1,1] = 1; sc[1,2] = 1;
	choice_mult_sc = choice_mult # sc; /*multiply choice prob and scrolling prob at each position*/
	/************************/ 

	/************************/
	/*Create a vector for probability of ending at all positions greater then the maximum position 
	Up to the maximum position, the probability of ending is 0*/
		ed = 1 - sc; 
		ed[1, 1:max_pos-1] = 0; 
		choice_mult_ed = choice_mult # ed; /*multiply choice prob and scrolling prob at each position*/ 
	/************************/ 

	/************************/
	/*Compute the probability of a sample path for all possible end points in a session. Store each probability in a vector called LL*/	
		lag_prod_sc = lag(cuprod(choice_mult_sc))`; 
		lag_prod_sc[1,1] = 1;
		LL = lag_prod_sc # choice_mult_ed;
		if LL < 1/(2**500) then LL = 1/(2**500);

	/************************/
	s[i] = log(10*sum(LL));													/*Probability of aN OBSERVED session is the sum of all sample paths */
	end; 
return sum(s);
finish;

/************************/
/*Maximize sum(s)*/	
optn = {1, /* 3. find max of function, and */ 2}; /* print moderate amount of output */
con = {. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , 
	   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .};
call nlpnra(rc, xres, "LogLik", param, optn, con);
/************************/
quit; 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I assume X is sorted by the first column?

1. Unless you have some very extreme outliers, the MLE estimates for 4 million observations are going to be very similar to estimates for a smaller subset of the data. If possible, use a smaller subset (like 8K-10K) to obtain preliminary estimates, then use those estimates as an initial guess for the MLE applied to the whole data set.

2. I think your logic is wrong in the line  

if LL < 1/(2**500) then LL = 1/(2**500);

LL is a vector, which means that the statement is interpreted as if ALL(LL) < 2##(-500) then...
See IF-THEN logic with matrix expressions - The DO Loop (sas.com)

3. The following parameters only need to be computed once prior to calling the optimizer. They do not need to be computed during each call to the log-likelihood (LL) function.

	sess_credits = 					max(x[idx,5]`);
	sl_fe_scroll =  				max(x[idx,6]`);
	sl_pos_scroll =  				max(x[idx,7]`);
	max_pos = 						max(max(x[idx,8]`),3);
	wk = 							max(x[idx,9]`);

4. I don't think you need to subset the data every time you call the LL. For example, if you are analyzing 3 groups, you could subset X into x1, x2, x3 before calling the optimizer. You would use x1, x2, and x3 on the GLOBAL clause instead of X. If you are analyzing many groups (or you don't know how many groups you will be analyzing), you can build a list of subsets L such that L$i is the i_th subset.

 

I suspect that precomputing the subsets of x will improve the performance.

View solution in original post

6 REPLIES 6
Rick_SAS
SAS Super FREQ

I assume X is sorted by the first column?

1. Unless you have some very extreme outliers, the MLE estimates for 4 million observations are going to be very similar to estimates for a smaller subset of the data. If possible, use a smaller subset (like 8K-10K) to obtain preliminary estimates, then use those estimates as an initial guess for the MLE applied to the whole data set.

2. I think your logic is wrong in the line  

if LL < 1/(2**500) then LL = 1/(2**500);

LL is a vector, which means that the statement is interpreted as if ALL(LL) < 2##(-500) then...
See IF-THEN logic with matrix expressions - The DO Loop (sas.com)

3. The following parameters only need to be computed once prior to calling the optimizer. They do not need to be computed during each call to the log-likelihood (LL) function.

	sess_credits = 					max(x[idx,5]`);
	sl_fe_scroll =  				max(x[idx,6]`);
	sl_pos_scroll =  				max(x[idx,7]`);
	max_pos = 						max(max(x[idx,8]`),3);
	wk = 							max(x[idx,9]`);

4. I don't think you need to subset the data every time you call the LL. For example, if you are analyzing 3 groups, you could subset X into x1, x2, x3 before calling the optimizer. You would use x1, x2, and x3 on the GLOBAL clause instead of X. If you are analyzing many groups (or you don't know how many groups you will be analyzing), you can build a list of subsets L such that L$i is the i_th subset.

 

I suspect that precomputing the subsets of x will improve the performance.

rakeshallu
Obsidian | Level 7
Thank you for the response, Rick.
Apologies for the delay in responding. At first, I wasn't sure of your reccomendation. Having spent some time with it, I see what you are saying.

Will get back to you shortly with how these reccommendations improve performance. Thank you!
rakeshallu
Obsidian | Level 7

Thank you again for the response, Rick. 

I implemented point 1, corrected the LL in point 2, subsetted X before the LogLik function, removed the CON matrix. The performance does improve and it is promising. Below are some summaries - 

 

Number of rows  Number of Parameters  Old Code Real Time  New Code Real Time 
30,000 5 4.02 3.32
30,000 55 5:17.51 4:14.30

 

That said, the number of parameters seems to be a bigger bottleneck which I do not know how to naviagte.  Please let me know if you have any more suggestions. Relatedly, I am wondering if this following step an be done better:

 

if wk = 52 then k = 0; else do; z = wk + 5; k = param[z]; end;
sc = logistic(param[1] + param[2]*pos + param[3]*(sess_usage_upto/pos) + param[4]*lag_use + param[5]*sess_credits + k);

As always, thank you. 

 

Best, 

Rakesh

 

Rick_SAS
SAS Super FREQ

1. You don't need to transpose data when you are taking the MAX. So all statements that look like max(vector`) can be rewritten as max(vector)

2. In fact, I don't know why you are transposing any of the vectors. You can use column vectors instead of row vectors and save 6*ncol(b) vector transposes per iteration.

3. It looks like sess_usage_upto is only used as the ratio sess_usage_upto/pos. Therefore, you can compute the ratio one time by modifying the 3rd column:

x[,3] = x[,3] / x[,2];

Then there is not need for the ratio inside the LOGISTIC function argumet.

4. Regarding the LOGISTIC call, if you stop transposing and use column vectors, you can use matrix multiplication inside the LOGISTIC call. The linear combination of the vectors is equal to x[idx,2:4)*param[2:4] or x[idx,2:4)*param[2:4]`  (not sure which)

 

These tips might give you another 5-10% improvement, but the size of the data set is what is hurting the performance.

rakeshallu
Obsidian | Level 7
Thank you, Rick. I do see a performance improvement of ~15% in real-time.

The dummy variables I was estimating were fixed effects at a week level (51 coefficients), now I've chosen to include quarter fixed effects instead of week to get quick convergence. The improved code works super fine.

Thank you for all your inputs.
Rick_SAS
SAS Super FREQ

Also, if you are performing unconstrained optimization, you do not need to define the CON matrix. You can omit that argument instead of defining it as a matrix of missing values.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 6 replies
  • 1365 views
  • 4 likes
  • 2 in conversation