Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- hpgenselect for continuous target variable

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-05-2017 06:21 PM

Hi,

I am unsure if hpgenselect can be applied when target is continuous and has beta distribution. I do not want to use Beta Regression, does any other approach work if not hpgenselect ?

Kind Regards

SK

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-05-2017 09:41 PM

Unfortunately, this procedure cannot handle the beta distribution. As an approximation, you could use PROC GLMSELECT. You could use the weight statement to account for unequal variances for Y.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 04:16 AM

Or you can use proc hpnlmod. The beta distribution is quite simple, so you can specify the likelihood inside hpnlmod, and use the "general" likelihood in the model statement.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 07:46 AM

Here a simple example of how you can find the log-likelihood estimates of the two parameters if all data are beta-distributed with same parameters. I think the example easily can be extended to situations where there are some covariates in the data.

```
data simulation;
do i=1 to 1000;
y=rand('beta',2,3);
sqy=y**2;
output;
end;
run;
*start values are found by the moment method. Therefore, mean of y and y^2 are calculated.;
proc means data=simulation mean ;
var y sqy;
output out=startvalues mean=y sqy;
run;
data _NULL_;
set startvalues;
a=y*(y-sqy)/(sqy-y**2);
b=(y-1)*(sqy-y)/(sqy-y**2);
put a= b=;
call symput('starta',put(a,best.));
call symput('startb',put(b,best.));
run;
*here the likelihood estimates will be found;
*The moment estimators from above are used as starting values;
proc hpnlmod data=simulation;
parm a &starta. b &startb.;
ll=(a-1)*log(y)+(b-1)*log(1-y)-logbeta(a,b);
model i~general(ll);
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 08:29 AM

I like JacobSimonsen's approach.

@JacobSimonsen, could you share why you decided to go with PROC HPNLMOD? I would have chosen PROC NLMIXED, like this:

```
proc nlmixed data=simulation;
parms a &starta. b &startb.;
bounds 0 < a,b;
ll=(a-1)*log(y)+(b-1)*log(1-y)-logbeta(a,b);
model y ~ general(ll);
run;
```

@Siddharth123, if you want to see additional examples formulating models as MLE problems and using SAS procedures (such as NLMIXED) to solve, see

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 08:41 AM

My simple rule of thumb of whether I should choose PROC HPNLMOD or PROC NLMIXED is that if I have random effects then I use NLMIXED and otherwise HPNLMOD. That is simple because HPNLMOD in general is faster. In this case I have no strong opinion of which of these two procedure that should be used. Why would you choose NLMIXED?

I agree that it is wise to have the boundary option.

I find it a bit funny that when the "general" likelihood is used, then it doesnt matter what variable that is on the left side of "~". Both NLMIXED and HPNLMOD require a variable there.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-07-2017 09:59 AM

You can fit a beta model using PROC GLIMMIX or PROC FMM. See the DIST=BETA option in the MODEL statement. See this example of using the beta distribution in GLIMMIX to model a continuous proportion response.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-07-2017 10:31 AM

As others have correctly pointed out, there are a few ways to fit models to data with a beta distribution. GLIMMIX is the easiest way. However, since the original question dealt with HPGENSELECT, one would assume that they were trying to do variable selection from a large number of potential predictor variables. That cannot be done in an automated way with GLIMMIX or NLMIXED.

One should always be careful with the beta distribution: it is defined for 0 < y < 1. This means that all values of y equal to 0 or 1 will become missing values in GLIMMIX. My experience is that datasets with continuous proportions usually have 0s and 1s.