BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SherriF
Obsidian | Level 7

Hi,

 

Is SAS capable to run nonlinear quantile regression?

Especially, I have three parameters needed to be estimated.

 

I saw several quantile regression samples, but all of them are using general regression models. 

Is there any way I can use proc quantreg with nonlinear models to estimate three parameter variables?

 

Thanks,

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Ordinarily, I would say SAS can't do this, but I think it may depend on the nonlinear model you are considering.  If you are content with a polynomial approximation of your model, or with a semiparametric approach using a spline, then PROC QUANTREG offers an EFFECT statement to consider.  However, if you have a specific nonlinear model in mind (that can't be linearized by transformation), then I think you might not have any options.

 

Steve Denham 

View solution in original post

13 REPLIES 13
SteveDenham
Jade | Level 19

Ordinarily, I would say SAS can't do this, but I think it may depend on the nonlinear model you are considering.  If you are content with a polynomial approximation of your model, or with a semiparametric approach using a spline, then PROC QUANTREG offers an EFFECT statement to consider.  However, if you have a specific nonlinear model in mind (that can't be linearized by transformation), then I think you might not have any options.

 

Steve Denham 

SherriF
Obsidian | Level 7

Hi StevenDenham,

 

I solved my problem by tranforming the specific model into a linear model by your hint in the parenthesis.

 

Thank you!

 

Sherri 

Rick_SAS
SAS Super FREQ

@SteveDenham's suggestions are applicable if you want a model that is nonlinear in the effect variables, but using the EFFECT statement still results in a linear model of the parameters. So technically the answer to your question is "No, PROC QUANTREG will not estimate parameters that appear nonlinearly in the model."

 

Theoretically, quantile regression requires solving a nonliner optimization problem, which you can do in PROC IML (or PROC OPTMODEL in SAS/OR software). Therefore if you want to pursue this problem, you could try to pose it as a nonlinear optimization problem in IML. The problem is that the objective function is usually not smooth, so standard nonlinear programming techniques like Newton's Method are not applicable.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

If you truly want quantile regression (QR) for models that are nonlinear in the parameters, then Rick is right, there is no dedicated procedure. However, if you are knowledgeable in statistics and know how to program in NLMIXED, there may be hope. There have been several developments in "parametric" approaches to the QR problem. See articles by Geraci and colleagues (maybe start with a 2008 Biostatistics article). Their aproach is for linear models, and most recently for linear mixed quantile models, but the following points should still apply to nonlinear. The idea is to pretend that the data have an asymmetric Laplace (AL) distribution. This is a 3-parameter distribution, where one of the parameters, eta, is the tau-th quantile. That is, tau defines the desired quantile and is fixed by the user; e.g., if one specifies tau=0.5, then eta is the median; with tau=0.75, then eta is the upper quartile; and so on. Eta could either be a single number, or expanded, so that eta = f(X, other parameters).  One can fit this model, in principle, by defining the likelihood in NLMIXED. I say "in principle" because I am not yet convinced that I have worked out the details. I have played with this a few times over the last couple of years, but never got far enough. There are several problems with my effort so far, but you may find this helpful.

 

The code below runs, but it doesn't necessarily work.That is, I generate data for a linear model (as an example), and then fit a linear model for the tau-th quantile using NLMIXED (changing one line of code could make it appropriate for a nonlinear model). It converges, but the Hessian has a negative eigenvalue, which is a pathological problem for the standard errors. But I don't think the SEs would be valid anyway, because one is only using the AL as crank to get the quantiles, not for inference. One probably should do bootstrapping for measures of uncertainty. It is critical that the starting values for the parameters are CLOSE to the true values before the optimization starts. Thus, you need a dense grid search specified in the parms statement. If initial guesses are not close, the final estimates will be far from the correct ones. I found it helpful to use an optimization method that does not use derivatives (nmsimp).

 

You should only treat my code as a guide. I am not convinced that I am doing this correctly, so others may find problems with my logic or my coding. I have played with fitting a nonlinear model. I can get it to work, but results can be a bit strange with nonlinear quantile models. The different prediction lines can actually cross.

I am also attaching example sas code to do three different quantiles for the linear case, and also three quantiles for a nonlinear example. You should know that the results do NOT duplicate the results from PROC QUANTILE for the linear case (they may be close). Based on Geraci, this is not unexpected.

 

data t;
call streaminit(12345);
do x = 0 to 100 by 10;
	do j = 1 to 10 by 1;
		e = rand('norm',0)*12;
		mu = 10 + 1.5*x;
		y = mu + e;
		output;
	end;
end;
run;

proc nlmixed data=t tech=nmsimp maxiter=1000 ;
	title2 'median (tau=0.5)';
	*---must do initial search of parameter estimates to get close;
	parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ; 
	tau = 0.5;	*<--for median. change to 0.75 for 75th quantile, etc.;
	*---in example, nu is intercept, and beta is slope;	
	eta = nu + beta*x;	*<--change this to any function (linear or nonlinear);

	*---rest does not change;
	diff = y - eta;
	rho = diff*(tau - (diff < 0));
	ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
	model y ~ general(ll);				*--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
	estimate 'var(quant)' sigma**2;
	predict eta out=prd ;
run;

*proc print data=prd;run;

proc sgplot data=prd;
scatter y=y x=x;
series y=pred x=x;
run;
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The website won't let me attach the sas file. It keeps telling me that content does not match file type (which is ridiculous). I can cut an paste the total contents in a message if you want it.

Rick_SAS
SAS Super FREQ

LVM, attach the file as a *.txt file or zip it into a *.zip file.  I'll ask @LainieH to look into why you can't upload a *.sas file.

 

Regarding your NLMIXED approach to quantiles, that sounds interesting.  I've never heard of that before.  One comment: You shouldn't be put off if your solutions cross. This also happens for linear quantile regression. It is a well-known phenomenon. If you do an internet search for the terms "quantile regression" and "crossing problem" you will get many hits that describe the problems and some recent work that proposes regression algorithms that give non-crossing solutions.

LainieH
Community Manager

Hi @Rick_SAS and @lvm, sas files can definitely be uploaded. However, if it is over 5MB it will give you an error.  The site currently only allows for that file size right now. Otherwise, you may need to put it into a Zip file. The system is configured to allow 'sas' files, lowercase, I'm not sure if that makes a difference but if you want to try making the .sas in lowercase if it is not already that is one way to troubleshoot. 

 

If the file is under 5MB and it is still giving you an error, I can check into this further. 

 

Thanks,

Lainie

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Rick, thanks for the comments. I am well aware of the line-crossing issue. I just like when it doesn't happen, but I know it does. Another good source on the AD distribution is Geraci and Bottai (Stat. Comput. 2014. 24: 461-479). This is for quantile mixed models, which is why I learned about it. But section 2 of the article is a good synopsis of the application of this distribution. You can read about the link between the distribution and the L1 norm that is commonly used for quantile regression.

 

As I wrote last night, my coding for this problem is a work in progress. I make not guarantees at this point. I will try posting my full program in a following post.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I tried again to attach a file, with the lower case sas extension. It still said that the contents didn't match. So I am pasting the entire program here. First linear and then nonlinear, for three quantiles (the quartiles).  A work in progress (no guarantees).

/* Linear QR with Asymmetrical Laplace (AD) distribution */
/* (may or may not be working correctly. No warranties.  */
title 'Quantile regression with Asymmetrical Laplace';
data t;
call streaminit(12345);
do x = 0 to 100 by 10;
	do j = 1 to 10 by 1;
		e = rand('norm',0)*12;
		mu = 10 + 1.5*x;
		y = mu + e;
		output;
	end;
end;
run;


proc quantreg data=t;
model y = x / quantile = 0.25, 0.5, 0.75;
run;



proc nlmixed data=t tech=nmsimp maxiter=1000 ;
	title2 'median (tau=0.5)';
	*---must do initial search of parameter estimates to get close;
	parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ; 
	tau = 0.5;	*<--for median. change to 0.75 for 75th quantile, etc.;
	*---in example, nu is intercept, and beta is slope;	
	eta = nu + beta*x;	*<--change this to any function (linear or nonlinear);

	*---rest does not change;
	diff = y - eta;
	rho = diff*(tau - (diff < 0));
	ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
	model y ~ general(ll);				*--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
	estimate 'var(quant)' sigma**2;
	predict eta out=prd ;
run;

*proc print data=prd;run;

proc sgplot data=prd;
scatter y=y x=x;
series y=pred x=x;
run;


proc nlmixed data=t tech=nmsimp maxiter=10000 ;
	title2 'median (tau=0.75)';
	parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ; 
	
	tau = 0.75;
		
	eta = nu + beta*x;
	diff = y - eta;
	rho = diff*(tau - (diff < 0));
	ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
	model y ~ general(ll);				*--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
	estimate 'var(quant)' sigma**2;
	predict eta out=prd75 ;
run;

*proc print data=prd;run;

proc sgplot data=prd75;
scatter y=y x=x;
series y=pred x=x;
run;


proc nlmixed data=t tech=nmsimp maxiter=1000 ;
	title2 '25% quantile (tau=0.25)';
	
	parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ; 
	tau = 0.25;	*<--define the desired quantile;
		
	eta = nu + beta*x;
	diff = y - eta;
	rho = diff*(tau - (diff < 0));
	ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
	model y ~ general(ll);				*--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
	estimate 'var(quant)' sigma**2;
	predict eta out=prd25 ;
run;

*proc print data=prd;run;

proc sgplot data=prd25;
scatter y=y x=x;
series y=pred x=x;
run;

*---merge files with predicted values;
data predco; merge prd25(rename=(pred=pred25)) 
	prd(rename=(pred=pred50)) 
	prd75(rename=(pred=pred75));

proc sgplot data=predco;
title2 'view three predicted quantiles';
scatter y=y x=x;
series y=pred25 x=x / lineattrs=(pattern=1 color=red thickness=2);
series y=pred50 x=x / lineattrs=(pattern=2 color=black thickness=2);
series y=pred75 x=x  / lineattrs=(pattern=3 color=blue thickness=2);

run;

/*  Now for a possible nonlinear case */

title 'Quantile regression for nonlinear model example, using AD distribution';
data l;
call streaminit(12345);
do x = 0 to 100 by 10;
	do j = 1 to 10 by 1;
		e = rand('norm',0)*12;
		logit = -5 + .08*x;
		mu = 100/(1+exp(-logit));
		y = mu + e;
		output;
	end;
end;
run;

*proc print data=l;run;

proc sgplot data=l;
scatter y=y x=x;
run;



proc nlmixed data=l tech=nmsimp maxiter=1000   ;
	title2 'median (tau=0.5)';
	parms nu =-12 to 6 by 1 beta = .0 to .4 by .05 sigma = 1 to 8 by 1 ; 
	tau = 0.5;
		
	eta = 100/(1+exp(-(nu + beta*x)));
	diff = y - eta;
	rho = diff*(tau - (diff < 0));
	ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
	model y ~ general(ll);				*--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
	estimate 'var(quant)' sigma**2;
	predict eta out=prd ;
run;

*proc print data=prd;run;

proc sgplot data=prd;
scatter y=y x=x;
series y=pred x=x;
run;


proc nlmixed data=l tech=nmsimp maxiter=10000 ;
	title2 '75% quantile (tau=0.75)';
	parms nu =-12 to 6 by .5 beta = .0 to .4 by .025 sigma = 1 to 6 by .5 ;
	tau = 0.75;
		
	eta = 100/(1+exp(-(nu + beta*x)));
	diff = y - eta;
	rho = diff*(tau - (diff < 0));
	ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
	model y ~ general(ll);				*--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
	estimate 'var(quant)' sigma**2;
	predict eta out=prd75 ;
run;

*proc print data=prd;run;

proc sgplot data=prd75;
scatter y=y x=x;
series y=pred x=x;
run;


proc nlmixed data=l tech=nmsimp maxiter=10000 ;
	title2 '25% quantile (tau=0.25)';
	parms nu =-12 to 6 by .5 beta = .0 to .4 by .025 sigma = 1 to 6 by .5 ;

	tau = 0.25;
		
	eta = 100/(1+exp(-(nu + beta*x)));
	diff = y - eta;
	rho = diff*(tau - (diff < 0));
	ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
	model y ~ general(ll);				*--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
	estimate 'var(quant)' sigma**2;
	predict eta out=prd25 ;
run;

*proc print data=prd;run;

proc sgplot data=prd25;
scatter y=y x=x;
series y=pred x=x;
run;

data predco; merge prd25(rename=(pred=pred25)) prd(rename=(pred=pred50)) prd75(rename=(pred=pred75));

ods html style=analysis;
proc sgplot data=predco;
scatter y=y x=x;
series y=pred25 x=x / lineattrs=(pattern=1 color=red thickness=2);
series y=pred50 x=x / lineattrs=(pattern=2 color=black thickness=2);
series y=pred75 x=x  / lineattrs=(pattern=3 color=blue thickness=2);
run;



SherriF
Obsidian | Level 7

Hi Ivm,

 

Thank you for your detail answer. It is interesting and really a good learning material!

 

Thanks,

Sherri

LainieH
Community Manager
@IVM I'm looking into the file upload issue. It may be something on our side.
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I tried the file upload from two different computers, with the same failed result. But I pasted the full program within the post, so there is no urgency.

LainieH
Community Manager

@lvm and @Rick_SAS: just to follow-up.

 

I opened a ticket with our platform vendor. They have identified the issue and are working to fix. I don't have an estimated time for the fix but I will let you know when I have more information. Thank you for reporting this.

 

Lainie

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 2530 views
  • 5 likes
  • 5 in conversation