Re: hpgenselect for continuous target variable

Siddharth123 · Posted 07-05-2017 06:21 PM

Hi,

I am unsure if hpgenselect can be applied when target is continuous and has beta distribution. I do not want to use Beta Regression, does any other approach work if not hpgenselect ?

Kind Regards

SK

lvm · Posted 07-05-2017 09:41 PM

Unfortunately, this procedure cannot handle the beta distribution. As an approximation, you could use PROC GLMSELECT. You could use the weight statement to account for unequal variances for Y.

JacobSimonsen · Posted 07-06-2017 04:16 AM

Or you can use proc hpnlmod. The beta distribution is quite simple, so you can specify the likelihood inside hpnlmod, and use the "general" likelihood in the model statement.

JacobSimonsen · Posted 07-06-2017 07:46 AM

Here a simple example of how you can find the log-likelihood estimates of the two parameters if all data are beta-distributed with same parameters. I think the example easily can be extended to situations where there are some covariates in the data.

data simulation;
  do i=1 to 1000;
    y=rand('beta',2,3);
	sqy=y**2;
	output;
  end;
run;

*start values are found by the moment method. Therefore, mean of y and y^2 are calculated.;
proc means data=simulation mean ;
  var y sqy;
  output out=startvalues mean=y sqy;
run;

data _NULL_;
  set startvalues;
  a=y*(y-sqy)/(sqy-y**2);
  b=(y-1)*(sqy-y)/(sqy-y**2);
  put a= b=;
  call symput('starta',put(a,best.));
  call symput('startb',put(b,best.));
run;

*here the likelihood estimates will be found; 
*The moment estimators from above are used as starting values;

proc hpnlmod data=simulation;
  parm a &starta. b &startb.;
  ll=(a-1)*log(y)+(b-1)*log(1-y)-logbeta(a,b);
  model i~general(ll);
run;

Rick_SAS · Posted 07-06-2017 08:29 AM

I like JacobSimonsen's approach.

@JacobSimonsen, could you share why you decided to go with PROC HPNLMOD? I would have chosen PROC NLMIXED, like this:

proc nlmixed data=simulation;
  parms a &starta. b &startb.;
  bounds 0 < a,b;
  ll=(a-1)*log(y)+(b-1)*log(1-y)-logbeta(a,b);
  model y ~ general(ll);
run;

@Siddharth123, if you want to see additional examples formulating models as MLE problems and using SAS procedures (such as NLMIXED) to solve, see

JacobSimonsen · Posted 07-06-2017 08:41 AM

My simple rule of thumb of whether I should choose PROC HPNLMOD or PROC NLMIXED is that if I have random effects then I use NLMIXED and otherwise HPNLMOD. That is simple because HPNLMOD in general is faster. In this case I have no strong opinion of which of these two procedure that should be used. Why would you choose NLMIXED?

I agree that it is wise to have the boundary option.

I find it a bit funny that when the "general" likelihood is used, then it doesnt matter what variable that is on the left side of "~". Both NLMIXED and HPNLMOD require a variable there.

StatDave · Posted 07-07-2017 09:59 AM

You can fit a beta model using PROC GLIMMIX or PROC FMM. See the DIST=BETA option in the MODEL statement. See this example of using the beta distribution in GLIMMIX to model a continuous proportion response.

lvm · Posted 07-07-2017 10:31 AM

As others have correctly pointed out, there are a few ways to fit models to data with a beta distribution. GLIMMIX is the easiest way. However, since the original question dealt with HPGENSELECT, one would assume that they were trying to do variable selection from a large number of potential predictor variables. That cannot be done in an automated way with GLIMMIX or NLMIXED.

One should always be careful with the beta distribution: it is defined for 0 < y < 1. This means that all values of y equal to 0 or 1 will become missing values in GLIMMIX. My experience is that datasets with continuous proportions usually have 0s and 1s.

SAS Innovate 2025: Call for Content