07-05-2017 06:21 PM
I am unsure if hpgenselect can be applied when target is continuous and has beta distribution. I do not want to use Beta Regression, does any other approach work if not hpgenselect ?
07-05-2017 09:41 PM
Unfortunately, this procedure cannot handle the beta distribution. As an approximation, you could use PROC GLMSELECT. You could use the weight statement to account for unequal variances for Y.
07-06-2017 04:16 AM
Or you can use proc hpnlmod. The beta distribution is quite simple, so you can specify the likelihood inside hpnlmod, and use the "general" likelihood in the model statement.
07-06-2017 07:46 AM
Here a simple example of how you can find the log-likelihood estimates of the two parameters if all data are beta-distributed with same parameters. I think the example easily can be extended to situations where there are some covariates in the data.
data simulation; do i=1 to 1000; y=rand('beta',2,3); sqy=y**2; output; end; run; *start values are found by the moment method. Therefore, mean of y and y^2 are calculated.; proc means data=simulation mean ; var y sqy; output out=startvalues mean=y sqy; run; data _NULL_; set startvalues; a=y*(y-sqy)/(sqy-y**2); b=(y-1)*(sqy-y)/(sqy-y**2); put a= b=; call symput('starta',put(a,best.)); call symput('startb',put(b,best.)); run; *here the likelihood estimates will be found; *The moment estimators from above are used as starting values; proc hpnlmod data=simulation; parm a &starta. b &startb.; ll=(a-1)*log(y)+(b-1)*log(1-y)-logbeta(a,b); model i~general(ll); run;
07-06-2017 08:29 AM
I like JacobSimonsen's approach.
@JacobSimonsen, could you share why you decided to go with PROC HPNLMOD? I would have chosen PROC NLMIXED, like this:
proc nlmixed data=simulation; parms a &starta. b &startb.; bounds 0 < a,b; ll=(a-1)*log(y)+(b-1)*log(1-y)-logbeta(a,b); model y ~ general(ll); run;
@Siddharth123, if you want to see additional examples formulating models as MLE problems and using SAS procedures (such as NLMIXED) to solve, see
07-06-2017 08:41 AM
My simple rule of thumb of whether I should choose PROC HPNLMOD or PROC NLMIXED is that if I have random effects then I use NLMIXED and otherwise HPNLMOD. That is simple because HPNLMOD in general is faster. In this case I have no strong opinion of which of these two procedure that should be used. Why would you choose NLMIXED?
I agree that it is wise to have the boundary option.
I find it a bit funny that when the "general" likelihood is used, then it doesnt matter what variable that is on the left side of "~". Both NLMIXED and HPNLMOD require a variable there.
07-07-2017 09:59 AM
You can fit a beta model using PROC GLIMMIX or PROC FMM. See the DIST=BETA option in the MODEL statement. See this example of using the beta distribution in GLIMMIX to model a continuous proportion response.
07-07-2017 10:31 AM
As others have correctly pointed out, there are a few ways to fit models to data with a beta distribution. GLIMMIX is the easiest way. However, since the original question dealt with HPGENSELECT, one would assume that they were trying to do variable selection from a large number of potential predictor variables. That cannot be done in an automated way with GLIMMIX or NLMIXED.
One should always be careful with the beta distribution: it is defined for 0 < y < 1. This means that all values of y equal to 0 or 1 will become missing values in GLIMMIX. My experience is that datasets with continuous proportions usually have 0s and 1s.