turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- how to choose between functional form

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2010 07:20 AM

Hi

Hi

I estimated my model using Linear and Log form (Ln). I am not sure how to compare between these form. I am aware there is (liklihood ratio and box-cox transformation). But I can not find clear procedure of how to do that.

I will be very thankful for any explanation of procedure (as well how to perfom that using sas )

Hi

I estimated my model using Linear and Log form (Ln). I am not sure how to compare between these form. I am aware there is (liklihood ratio and box-cox transformation). But I can not find clear procedure of how to do that.

I will be very thankful for any explanation of procedure (as well how to perfom that using sas )

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-27-2010 04:04 PM

The TRANSREG procedure can be employed to select the functional form using the Box-Cox transformation. See the following two sections from the SAS online documentation. The first section provides a general description of the Box-Cox transformation while the second section shows an example of the use of the TRANSREG procedure to perform a Box-Cox transformation.

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/transreg_sect15.htm

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/transreg_sect51.htm

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/transreg_sect15.htm

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/transreg_sect51.htm

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-04-2010 12:06 PM

Thanks a lot dale,

the example is totally unclear, it does not perform box-cox "test", it just choose the best transformation (lambda).

Please if you can explain how to do the test by SAS I will be very thankful

the example is totally unclear, it does not perform box-cox "test", it just choose the best transformation (lambda).

Please if you can explain how to do the test by SAS I will be very thankful

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-04-2010 05:03 PM

SAS provides a confidence interval for the parameter lambda. Any value of lambda which is outside the confidence interval represents a transformation which is statistically a poor fit when compared with the fit obtained for the optimal lambda.

Because the confidence interval is so small for the data which were presented in the example that I referenced, you don't actually see the confidence limits in the first example. We can observe the confidence limits by writing code that shows the model likelihood for very small increments of lambda in the region of lambda=0.

title 'Basic Box-Cox Example';

data x;

do x = 1 to 8 by 0.025;

y = exp(x + normal(7));

output;

end;

run;

proc transreg data=x ss2 details;

title2 'Several Options Demonstrated';

model boxcox(y / lambda=-2 -1 -0.5 -0.4 -0.3 -0.2

-0.1 to 0.1 by 0.01

0.2 0.3 0.4 0.5 1 2

convenient

alpha=0.01)

= identity(x);

run;

In the above example, we look at the effect of changes of 0.01 in lambda in the region near lambda=0. In specifying alpha=0.01, any value of lambda not marked with an asterisk (*) indicates a transformation which does not fit as well as the optimal transformation with a test level of p=0.01. So, lambda=-0.06 and all smaller values of lambda as well as lambda=0.03 and all larger values of lambda do not fit as well as a lambda of -0.02 at p=0.01.

As explained in the documentation which was previously referenced, the CONVENIENT option indicates that if the confidence interval contains a value in the set (-3, -2, -1, -0.5, 0, 0.5, 1, 2, 3), then the value from this list which has the smallest -2LL value will be flagged as the optimal transformation. So, rather than stating that we will use X**-0.02, we find that the transformation log(X) is identified as the "optimal" transformation.

I presume that you understand that the value of lambda indicates a power transformation (x**lambda) and that lambda=0 indicates taking the logarithm of x. Thus, we find for the data in the above example that log(x) is an appropriate transformation and this fits the data better than X**2, X, sqrt(X), 1/sqrt(X), 1/X or

1/(X**2).

Because the confidence interval is so small for the data which were presented in the example that I referenced, you don't actually see the confidence limits in the first example. We can observe the confidence limits by writing code that shows the model likelihood for very small increments of lambda in the region of lambda=0.

title 'Basic Box-Cox Example';

data x;

do x = 1 to 8 by 0.025;

y = exp(x + normal(7));

output;

end;

run;

proc transreg data=x ss2 details;

title2 'Several Options Demonstrated';

model boxcox(y / lambda=-2 -1 -0.5 -0.4 -0.3 -0.2

-0.1 to 0.1 by 0.01

0.2 0.3 0.4 0.5 1 2

convenient

alpha=0.01)

= identity(x);

run;

In the above example, we look at the effect of changes of 0.01 in lambda in the region near lambda=0. In specifying alpha=0.01, any value of lambda not marked with an asterisk (*) indicates a transformation which does not fit as well as the optimal transformation with a test level of p=0.01. So, lambda=-0.06 and all smaller values of lambda as well as lambda=0.03 and all larger values of lambda do not fit as well as a lambda of -0.02 at p=0.01.

As explained in the documentation which was previously referenced, the CONVENIENT option indicates that if the confidence interval contains a value in the set (-3, -2, -1, -0.5, 0, 0.5, 1, 2, 3), then the value from this list which has the smallest -2LL value will be flagged as the optimal transformation. So, rather than stating that we will use X**-0.02, we find that the transformation log(X) is identified as the "optimal" transformation.

I presume that you understand that the value of lambda indicates a power transformation (x**lambda) and that lambda=0 indicates taking the logarithm of x. Thus, we find for the data in the above example that log(x) is an appropriate transformation and this fits the data better than X**2, X, sqrt(X), 1/sqrt(X), 1/X or

1/(X**2).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-04-2010 05:24 PM

Dale I dont know how to thank you. Still I can not understand the example since I am from different area speciality.

My model is based on panel data as follows:

Y(t)= b0 + b1 X(1t) + b2 X(2t)

I also have another specification (transformation):

Ln Y(t)= b0 + b1 Ln X(1t) + b2 Ln X(2t)

I simply try to find the best functional form based on BOX-COX test or PE test.

I will be very very yhankful if you can show me the command I have to use in SAS

My model is based on panel data as follows:

Y(t)= b0 + b1 X(1t) + b2 X(2t)

I also have another specification (transformation):

Ln Y(t)= b0 + b1 Ln X(1t) + b2 Ln X(2t)

I simply try to find the best functional form based on BOX-COX test or PE test.

I will be very very yhankful if you can show me the command I have to use in SAS

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-11-2010 03:19 PM

Tom,

I believe that I see the problem that you are having relating your need to the examples which I have alluded to. You want to determine a transformation of both the response AND the predictors - not just a transformation of the response.

The Box-Cox transformation is employed to assess a functional form of the response*given that the predictors are fixed!* When the predictors are allowed to change, the problem is no longer just an examination of the functional form of the response.

For a specified transformation of the response, we can choose the transformation of the predictors which is best as the transformation which maximizes the likelihood function.

If you consider a Box-Cox transformation conditional on the predictors X1 and X2, then the code which has been demonstrated will help you to determine whether an identity transformation or a log transformation is the best way to model the response. But it is quite possible that the identity transformation of the response performs better than the log transformation if you have predictors which have not been log-transformed. On the other hand, the log transformation may perform better than the identity transformation if the predictors have been log transformed. So, your desire to choose a transformation of both the response and the predictors is reasonable.

If you fit PROC TRANSREG twice, once in which you fix the functional form of the predictors to be the identity transformation and the second time with the functional form of the predictors specified to be a log transformation, then you should be able to determine whether an identity transformation on both sides produces the better model or whether a log transformation on both sides produces the better model. Simply select the model which has the largest log likelihood statistic across all transformations of the response and across all transformations of the predictors.

Alternatively, you could examine the distribution of residuals. The preferred model would have residuals which appear to best follow a normal distribution. Thus, you might use code like that below to determine the best form for both predictors and response:

ods html;

title "Box-Cox for form of Y given identity trans for X1, X2";

proc transreg data=mydata;

model boxcox(y / lambda=0 1)

= identity(x1 x2);

run;

title "Box-Cox for form of Y given log trans for X1, X2";

proc transreg data=mydata;

model boxcox(y / lambda=0 1)

= log(x1 x2);

run;

ods graphics on;

title "Identity transformation of Y, X1, X2";

proc mixed data=mydata;

model y = x1 x2 / residual;

run;

title "Log transformation of Y, X1, X2";

proc mixed data=mydata;

model log_y = log_x1 log_x2 / residual;

run;

ods graphics off;

I believe that I see the problem that you are having relating your need to the examples which I have alluded to. You want to determine a transformation of both the response AND the predictors - not just a transformation of the response.

The Box-Cox transformation is employed to assess a functional form of the response

For a specified transformation of the response, we can choose the transformation of the predictors which is best as the transformation which maximizes the likelihood function.

If you consider a Box-Cox transformation conditional on the predictors X1 and X2, then the code which has been demonstrated will help you to determine whether an identity transformation or a log transformation is the best way to model the response. But it is quite possible that the identity transformation of the response performs better than the log transformation if you have predictors which have not been log-transformed. On the other hand, the log transformation may perform better than the identity transformation if the predictors have been log transformed. So, your desire to choose a transformation of both the response and the predictors is reasonable.

If you fit PROC TRANSREG twice, once in which you fix the functional form of the predictors to be the identity transformation and the second time with the functional form of the predictors specified to be a log transformation, then you should be able to determine whether an identity transformation on both sides produces the better model or whether a log transformation on both sides produces the better model. Simply select the model which has the largest log likelihood statistic across all transformations of the response and across all transformations of the predictors.

Alternatively, you could examine the distribution of residuals. The preferred model would have residuals which appear to best follow a normal distribution. Thus, you might use code like that below to determine the best form for both predictors and response:

ods html;

title "Box-Cox for form of Y given identity trans for X1, X2";

proc transreg data=mydata;

model boxcox(y / lambda=0 1)

= identity(x1 x2);

run;

title "Box-Cox for form of Y given log trans for X1, X2";

proc transreg data=mydata;

model boxcox(y / lambda=0 1)

= log(x1 x2);

run;

ods graphics on;

title "Identity transformation of Y, X1, X2";

proc mixed data=mydata;

model y = x1 x2 / residual;

run;

title "Log transformation of Y, X1, X2";

proc mixed data=mydata;

model log_y = log_x1 log_x2 / residual;

run;

ods graphics off;