Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: Nonlinear qunatile regression

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-28-2015 12:47 PM
(2841 views)

Hi,

Is SAS capable to run nonlinear quantile regression?

Especially, I have three parameters needed to be estimated.

I saw several quantile regression samples, but all of them are using general regression models.

Is there any way I can use *proc quantreg* with *nonlinear models* to estimate *three parameter variables*?

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Ordinarily, I would say SAS can't do this, but I think it may depend on the nonlinear model you are considering. If you are content with a polynomial approximation of your model, or with a semiparametric approach using a spline, then PROC QUANTREG offers an EFFECT statement to consider. However, if you have a specific nonlinear model in mind (that can't be linearized by transformation), then I think you might not have any options.

Steve Denham

13 REPLIES 13

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Ordinarily, I would say SAS can't do this, but I think it may depend on the nonlinear model you are considering. If you are content with a polynomial approximation of your model, or with a semiparametric approach using a spline, then PROC QUANTREG offers an EFFECT statement to consider. However, if you have a specific nonlinear model in mind (that can't be linearized by transformation), then I think you might not have any options.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi StevenDenham,

I solved my problem by tranforming the specific model into a linear model by your hint in the parenthesis.

Thank you!

Sherri

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@SteveDenham's suggestions are applicable if you want a model that is nonlinear in the effect variables, but using the EFFECT statement still results in a linear model of the parameters. So technically the answer to your question is "No, PROC QUANTREG will not estimate parameters that appear nonlinearly in the model."

Theoretically, quantile regression requires solving a nonliner optimization problem, which you can do in PROC IML (or PROC OPTMODEL in SAS/OR software). Therefore if you want to pursue this problem, you could try to pose it as a nonlinear optimization problem in IML. The problem is that the objective function is usually not smooth, so standard nonlinear programming techniques like Newton's Method are not applicable.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you truly want quantile regression (QR) for models that are nonlinear in the parameters, then Rick is right, there is no dedicated procedure. However, if you are knowledgeable in statistics and know how to program in NLMIXED, there may be hope. There have been several developments in "parametric" approaches to the QR problem. See articles by Geraci and colleagues (maybe start with a 2008 Biostatistics article). Their aproach is for linear models, and most recently for linear mixed quantile models, but the following points should still apply to nonlinear. The idea is to pretend that the data have an asymmetric Laplace (AL) distribution. This is a 3-parameter distribution, where one of the parameters, eta, is the tau-th quantile. That is, tau defines the desired quantile and is fixed by the user; e.g., if one specifies tau=0.5, then eta is the median; with tau=0.75, then eta is the upper quartile; and so on. Eta could either be a single number, or expanded, so that eta = f(X, other parameters). One can fit this model, in principle, by defining the likelihood in NLMIXED. I say "in principle" because I am not yet convinced that I have worked out the details. I have played with this a few times over the last couple of years, but never got far enough. There are several problems with my effort so far, but you may find this helpful.

The code below **runs**, but it doesn't necessarily **work**.That is, I generate data for a linear model (as an example), and then fit a linear model for the tau-th quantile using NLMIXED (changing one line of code could make it appropriate for a nonlinear model). It converges, but the Hessian has a negative eigenvalue, which is a pathological problem for the standard errors. But I don't think the SEs would be valid anyway, because one is only using the AL as crank to get the quantiles, not for inference. One probably should do bootstrapping for measures of uncertainty. It is critical that the starting values for the parameters are CLOSE to the true values before the optimization starts. Thus, you need a dense grid search specified in the parms statement. If initial guesses are not close, the final estimates will be far from the correct ones. I found it helpful to use an optimization method that does not use derivatives (nmsimp).

You should only treat my code as a guide. I am not convinced that I am doing this correctly, so others may find problems with my logic or my coding. I have played with fitting a nonlinear model. I can get it to work, but results can be a bit strange with nonlinear quantile models. The different prediction lines can actually cross.

I am also attaching example sas code to do three different quantiles for the linear case, and also three quantiles for a nonlinear example. You should know that the results do NOT duplicate the results from PROC QUANTILE for the linear case (they may be close). Based on Geraci, this is not unexpected.

```
data t;
call streaminit(12345);
do x = 0 to 100 by 10;
do j = 1 to 10 by 1;
e = rand('norm',0)*12;
mu = 10 + 1.5*x;
y = mu + e;
output;
end;
end;
run;
proc nlmixed data=t tech=nmsimp maxiter=1000 ;
title2 'median (tau=0.5)';
*---must do initial search of parameter estimates to get close;
parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ;
tau = 0.5; *<--for median. change to 0.75 for 75th quantile, etc.;
*---in example, nu is intercept, and beta is slope;
eta = nu + beta*x; *<--change this to any function (linear or nonlinear);
*---rest does not change;
diff = y - eta;
rho = diff*(tau - (diff < 0));
ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
estimate 'var(quant)' sigma**2;
predict eta out=prd ;
run;
*proc print data=prd;run;
proc sgplot data=prd;
scatter y=y x=x;
series y=pred x=x;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

LVM, attach the file as a *.txt file or zip it into a *.zip file. I'll ask @LainieH to look into why you can't upload a *.sas file.

Regarding your NLMIXED approach to quantiles, that sounds interesting. I've never heard of that before. One comment: You shouldn't be put off if your solutions cross. This also happens for linear quantile regression. It is a well-known phenomenon. If you do an internet search for the terms "quantile regression" and "crossing problem" you will get many hits that describe the problems and some recent work that proposes regression algorithms that give non-crossing solutions.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @Rick_SAS and @lvm, sas files can definitely be uploaded. However, if it is over 5MB it will give you an error. The site currently only allows for that file size right now. Otherwise, you may need to put it into a Zip file. The system is configured to allow 'sas' files, lowercase, I'm not sure if that makes a difference but if you want to try making the .sas in lowercase if it is not already that is one way to troubleshoot.

If the file is under 5MB and it is still giving you an error, I can check into this further.

Thanks,

Lainie

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Rick, thanks for the comments. I am well aware of the line-crossing issue. I just like when it doesn't happen, but I know it does. Another good source on the AD distribution is Geraci and Bottai (Stat. Comput. 2014. 24: 461-479). This is for quantile mixed models, which is why I learned about it. But section 2 of the article is a good synopsis of the application of this distribution. You can read about the link between the distribution and the L1 norm that is commonly used for quantile regression.

As I wrote last night, my coding for this problem is a work in progress. I make not guarantees at this point. I will try posting my full program in a following post.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I tried again to attach a file, with the lower case sas extension. It still said that the contents didn't match. So I am pasting the entire program here. First linear and then nonlinear, for three quantiles (the quartiles). A work in progress (no guarantees).

```
/* Linear QR with Asymmetrical Laplace (AD) distribution */
/* (may or may not be working correctly. No warranties. */
title 'Quantile regression with Asymmetrical Laplace';
data t;
call streaminit(12345);
do x = 0 to 100 by 10;
do j = 1 to 10 by 1;
e = rand('norm',0)*12;
mu = 10 + 1.5*x;
y = mu + e;
output;
end;
end;
run;
proc quantreg data=t;
model y = x / quantile = 0.25, 0.5, 0.75;
run;
proc nlmixed data=t tech=nmsimp maxiter=1000 ;
title2 'median (tau=0.5)';
*---must do initial search of parameter estimates to get close;
parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ;
tau = 0.5; *<--for median. change to 0.75 for 75th quantile, etc.;
*---in example, nu is intercept, and beta is slope;
eta = nu + beta*x; *<--change this to any function (linear or nonlinear);
*---rest does not change;
diff = y - eta;
rho = diff*(tau - (diff < 0));
ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
estimate 'var(quant)' sigma**2;
predict eta out=prd ;
run;
*proc print data=prd;run;
proc sgplot data=prd;
scatter y=y x=x;
series y=pred x=x;
run;
proc nlmixed data=t tech=nmsimp maxiter=10000 ;
title2 'median (tau=0.75)';
parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ;
tau = 0.75;
eta = nu + beta*x;
diff = y - eta;
rho = diff*(tau - (diff < 0));
ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
estimate 'var(quant)' sigma**2;
predict eta out=prd75 ;
run;
*proc print data=prd;run;
proc sgplot data=prd75;
scatter y=y x=x;
series y=pred x=x;
run;
proc nlmixed data=t tech=nmsimp maxiter=1000 ;
title2 '25% quantile (tau=0.25)';
parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ;
tau = 0.25; *<--define the desired quantile;
eta = nu + beta*x;
diff = y - eta;
rho = diff*(tau - (diff < 0));
ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
estimate 'var(quant)' sigma**2;
predict eta out=prd25 ;
run;
*proc print data=prd;run;
proc sgplot data=prd25;
scatter y=y x=x;
series y=pred x=x;
run;
*---merge files with predicted values;
data predco; merge prd25(rename=(pred=pred25))
prd(rename=(pred=pred50))
prd75(rename=(pred=pred75));
proc sgplot data=predco;
title2 'view three predicted quantiles';
scatter y=y x=x;
series y=pred25 x=x / lineattrs=(pattern=1 color=red thickness=2);
series y=pred50 x=x / lineattrs=(pattern=2 color=black thickness=2);
series y=pred75 x=x / lineattrs=(pattern=3 color=blue thickness=2);
run;
/* Now for a possible nonlinear case */
title 'Quantile regression for nonlinear model example, using AD distribution';
data l;
call streaminit(12345);
do x = 0 to 100 by 10;
do j = 1 to 10 by 1;
e = rand('norm',0)*12;
logit = -5 + .08*x;
mu = 100/(1+exp(-logit));
y = mu + e;
output;
end;
end;
run;
*proc print data=l;run;
proc sgplot data=l;
scatter y=y x=x;
run;
proc nlmixed data=l tech=nmsimp maxiter=1000 ;
title2 'median (tau=0.5)';
parms nu =-12 to 6 by 1 beta = .0 to .4 by .05 sigma = 1 to 8 by 1 ;
tau = 0.5;
eta = 100/(1+exp(-(nu + beta*x)));
diff = y - eta;
rho = diff*(tau - (diff < 0));
ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
estimate 'var(quant)' sigma**2;
predict eta out=prd ;
run;
*proc print data=prd;run;
proc sgplot data=prd;
scatter y=y x=x;
series y=pred x=x;
run;
proc nlmixed data=l tech=nmsimp maxiter=10000 ;
title2 '75% quantile (tau=0.75)';
parms nu =-12 to 6 by .5 beta = .0 to .4 by .025 sigma = 1 to 6 by .5 ;
tau = 0.75;
eta = 100/(1+exp(-(nu + beta*x)));
diff = y - eta;
rho = diff*(tau - (diff < 0));
ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
estimate 'var(quant)' sigma**2;
predict eta out=prd75 ;
run;
*proc print data=prd;run;
proc sgplot data=prd75;
scatter y=y x=x;
series y=pred x=x;
run;
proc nlmixed data=l tech=nmsimp maxiter=10000 ;
title2 '25% quantile (tau=0.25)';
parms nu =-12 to 6 by .5 beta = .0 to .4 by .025 sigma = 1 to 6 by .5 ;
tau = 0.25;
eta = 100/(1+exp(-(nu + beta*x)));
diff = y - eta;
rho = diff*(tau - (diff < 0));
ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ;
model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood;
estimate 'var(quant)' sigma**2;
predict eta out=prd25 ;
run;
*proc print data=prd;run;
proc sgplot data=prd25;
scatter y=y x=x;
series y=pred x=x;
run;
data predco; merge prd25(rename=(pred=pred25)) prd(rename=(pred=pred50)) prd75(rename=(pred=pred75));
ods html style=analysis;
proc sgplot data=predco;
scatter y=y x=x;
series y=pred25 x=x / lineattrs=(pattern=1 color=red thickness=2);
series y=pred50 x=x / lineattrs=(pattern=2 color=black thickness=2);
series y=pred75 x=x / lineattrs=(pattern=3 color=blue thickness=2);
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Ivm,

Thank you for your detail answer. It is interesting and really a good learning material!

Thanks,

Sherri

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@IVM I'm looking into the file upload issue. It may be something on our side.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.