turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Analysis Question: Which SAS/Stat proc to use

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-18-2009 02:42 PM

We have a very large sample of observations of a process that is believed to depend on a four real variables (w, x, y, and z), some of which are correlated. The process is expected to be “close to” a known function of w, call it g(w), with fluctuations around g that are the effect of the other three variables.

Question: How can SAS/Stat be used to derive a predictive formula of the form of:

F(w,x,y,z) = (a1 * g(w)) + (a2 * x) + (a3 * y) + (a4 * z),

where the ai’s are the standard “least squares” coefficients based on the given sample?

Thanks in advance...

Question: How can SAS/Stat be used to derive a predictive formula of the form of:

F(w,x,y,z) = (a1 * g(w)) + (a2 * x) + (a3 * y) + (a4 * z),

where the ai’s are the standard “least squares” coefficients based on the given sample?

Thanks in advance...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

12-18-2009 05:58 PM

You don't provide all relevant information to answer your question.

For instance, are there any parameters in F(w,x,y,z) or g(w) which need to be estimated? Or are F(w,x,y,z) and g(w) specified a priori so that all you need to do is estimate a1, a2, a3, and a4?

Also, would it be prudent to include an intercept in your model so that you estimate

F(w,x,y,z) = a0 + (a1*g(w)) + (a2*x) + (a3*y) + (a4*z)

And too, can we assume that the residuals are normally distributed? I would think that would be a reasonable assumption, but you know what happens when you assume too much.

Those are a few questions which come to mind immediately. If there is anything else which is important to know, please share that as well.

For instance, are there any parameters in F(w,x,y,z) or g(w) which need to be estimated? Or are F(w,x,y,z) and g(w) specified a priori so that all you need to do is estimate a1, a2, a3, and a4?

Also, would it be prudent to include an intercept in your model so that you estimate

F(w,x,y,z) = a0 + (a1*g(w)) + (a2*x) + (a3*y) + (a4*z)

And too, can we assume that the residuals are normally distributed? I would think that would be a reasonable assumption, but you know what happens when you assume too much.

Those are a few questions which come to mind immediately. If there is anything else which is important to know, please share that as well.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Dale

12-22-2009 08:18 AM

1. All values of g(w) will be known a priori, and the F(w,x,y,z) will be known (for a very large number of specific {w, x, y, z} combinations) as they are the results of the "observations" we're making.

2. Yes, we should probably calculate the intercept value, a0, as well.

3. Yes, I have been assuming the residuals are normally distributed.

Thanks!

2. Yes, we should probably calculate the intercept value, a0, as well.

3. Yes, I have been assuming the residuals are normally distributed.

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

12-30-2009 03:13 PM

I don't know what you mean when you state that "the F(w,x,y,z) will be known ... as they are the results of 'observations' we're making." Is there a variable which you are collecting which represents F(w,x,y,z)? Your response is a little cryptic here.

If you have a variable RESULT and you know the parameters of the function g(w) (and can therefore construct in a data set a variable g_w=g(w)), then you could simply use PROC REG to obtain the least squares solution to the equation

RESULT = a0 + (a1 * g_w) + (a2 * x) + (a3 * y) + (a4 * z)

You would simply write:

proc reg data=mydata;

model RESULT = g_2 x y z;

run;

But I am not certain that this is what you are looking for since I still don't know how F(w,x,y,z) is obtained.

If you have a variable RESULT and you know the parameters of the function g(w) (and can therefore construct in a data set a variable g_w=g(w)), then you could simply use PROC REG to obtain the least squares solution to the equation

RESULT = a0 + (a1 * g_w) + (a2 * x) + (a3 * y) + (a4 * z)

You would simply write:

proc reg data=mydata;

model RESULT = g_2 x y z;

run;

But I am not certain that this is what you are looking for since I still don't know how F(w,x,y,z) is obtained.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Dale

12-30-2009 04:07 PM

I think the problem is that I'm being sloppy in the way I'm using the function F. What we really known are the **values of the observations** based on a large number of specific combinations of parameters. Maybe this will help:

Let's assume I have N observations based on experiments performed with N sets of the four parameters. Let's say that for j = 1, 2...,N, O(j) is the value observed from experiment j, which had inputs of w(j), x(j), y(j), and z(j). What I'm really looking for is a formula, F(w,x,y,z), that lets me predict the outcome of an experiment run with a (potentially new) combination of the four parameters. I.e., given 4 new parameters, {w0, x0, y0, z0}, the (least squares)**predicted** value of the experiment would equal F(w0, x0, y0, z0).

I apologize for the confusion -- and hope this helps. Thanks.

Let's assume I have N observations based on experiments performed with N sets of the four parameters. Let's say that for j = 1, 2...,N, O(j) is the value observed from experiment j, which had inputs of w(j), x(j), y(j), and z(j). What I'm really looking for is a formula, F(w,x,y,z), that lets me predict the outcome of an experiment run with a (potentially new) combination of the four parameters. I.e., given 4 new parameters, {w0, x0, y0, z0}, the (least squares)

I apologize for the confusion -- and hope this helps. Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

12-31-2009 12:19 PM

So, F(w,x,y,z) is an observed value placed into the variable O. As previously stated, the REG procedure would be appropriate for constructing a predictor of O=F(w, x, y, z) which you could apply to F(w0, x0, y0, z0). With variables O, w, x, y, z, and g_w=g(w) in a data set named MYDATA, you can obtain parameters a0 through a4 of the equation

O = a0 + (a1 * g_w) + (a2 * x) + (a3 * y) + (a4 * z)

employing the code

proc reg data=mydata;

model O = g_w x y z;

run;

You could also use any of a number of other SAS procedures to obtain the same results: procs GLM, GENMOD, MIXED, GLIMMIX, ORTHOREG to name a few. The ORTHOREG procedure is of some special interest in that it works well with what are referred to as ill-conditioned data. Ill-conditioned data arises when there are very strong correlations among the predictor variables. But I presume that for your experimental setting where you are manipulating w, x, y, and z, poorly conditioned data should not really be a problem.

O = a0 + (a1 * g_w) + (a2 * x) + (a3 * y) + (a4 * z)

employing the code

proc reg data=mydata;

model O = g_w x y z;

run;

You could also use any of a number of other SAS procedures to obtain the same results: procs GLM, GENMOD, MIXED, GLIMMIX, ORTHOREG to name a few. The ORTHOREG procedure is of some special interest in that it works well with what are referred to as ill-conditioned data. Ill-conditioned data arises when there are very strong correlations among the predictor variables. But I presume that for your experimental setting where you are manipulating w, x, y, and z, poorly conditioned data should not really be a problem.