BookmarkSubscribeRSS Feed
Siddharth123
Obsidian | Level 7

Hi,

 

I am unsure if hpgenselect can be applied when target is continuous and has beta distribution. I do not want to use Beta Regression, does any other approach work if not hpgenselect ?

 

Kind Regards

SK

7 REPLIES 7
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Unfortunately, this procedure cannot handle the beta distribution. As an approximation, you could use PROC GLMSELECT. You could use the weight statement to account for unequal variances for Y.

JacobSimonsen
Barite | Level 11

Or you can use proc hpnlmod. The beta distribution is quite simple, so you can specify the likelihood inside hpnlmod, and use the "general" likelihood in the model statement.

JacobSimonsen
Barite | Level 11

Here a simple example of how you can find the log-likelihood estimates of the two parameters if all data are beta-distributed with same parameters. I think the example easily can be extended to situations where there are some covariates in the data.

data simulation;
  do i=1 to 1000;
    y=rand('beta',2,3);
	sqy=y**2;
	output;
  end;
run;

*start values are found by the moment method. Therefore, mean of y and y^2 are calculated.;
proc means data=simulation mean ;
  var y sqy;
  output out=startvalues mean=y sqy;
run;

data _NULL_;
  set startvalues;
  a=y*(y-sqy)/(sqy-y**2);
  b=(y-1)*(sqy-y)/(sqy-y**2);
  put a= b=;
  call symput('starta',put(a,best.));
  call symput('startb',put(b,best.));
run;

*here the likelihood estimates will be found; 
*The moment estimators from above are used as starting values;

proc hpnlmod data=simulation;
  parm a &starta. b &startb.;
  ll=(a-1)*log(y)+(b-1)*log(1-y)-logbeta(a,b);
  model i~general(ll);
run;
Rick_SAS
SAS Super FREQ

I like JacobSimonsen's approach.

 

@JacobSimonsen, could you share why you decided to go with PROC HPNLMOD?  I would have chosen PROC NLMIXED, like this:

 

proc nlmixed data=simulation;
  parms a &starta. b &startb.;
  bounds 0 < a,b;
  ll=(a-1)*log(y)+(b-1)*log(1-y)-logbeta(a,b);
  model y ~ general(ll);
run;

@Siddharth123, if you want to see additional examples formulating models as MLE problems and using SAS procedures (such as NLMIXED) to solve, see

JacobSimonsen
Barite | Level 11

My simple rule of thumb of whether I should choose PROC HPNLMOD or PROC NLMIXED is that if I have random effects then I use NLMIXED and otherwise HPNLMOD. That is simple because HPNLMOD in general is faster. In this case I have no strong opinion of which of these two procedure that should be used. Why would you choose NLMIXED?

 

I agree that it is wise to have the boundary option.

 

I find it a bit funny that when the "general" likelihood is used, then it doesnt matter what variable that is on the left side of "~". Both NLMIXED and HPNLMOD require a variable there.

StatDave
SAS Super FREQ

You can fit a beta model using PROC GLIMMIX or PROC FMM.  See the DIST=BETA option in the MODEL statement. See this example of using the beta distribution in GLIMMIX to model a continuous proportion response.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

As others have correctly pointed out, there are a few ways to fit models to data with a beta distribution. GLIMMIX is the easiest way. However, since the original question dealt with HPGENSELECT, one would assume that they were trying to do variable selection from a large number of potential predictor variables. That cannot be done in an automated way with GLIMMIX or NLMIXED.

 

One should always be careful with the beta distribution: it is defined for 0 < y < 1. This means that all values of y equal to 0 or 1 will become missing values in GLIMMIX. My experience is that datasets with continuous proportions usually have 0s and 1s.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1838 views
  • 4 likes
  • 5 in conversation