topic Analysis Question: Which SAS/Stat proc to use in Statistical Procedures

Analysis Question: Which SAS/Stat proc to use

deleted_user — Fri, 18 Dec 2009 19:42:43 GMT

We have a very large sample of observations of a process that is believed to depend on a four real variables (w, x, y, and z), some of which are correlated. The process is expected to be “close to” a known function of w, call it g(w), with fluctuations around g that are the effect of the other three variables.

Question: How can SAS/Stat be used to derive a predictive formula of the form of:

F(w,x,y,z) = (a1 * g(w)) + (a2 * x) + (a3 * y) + (a4 * z),

where the ai’s are the standard “least squares” coefficients based on the given sample?

Thanks in advance...

Re: Analysis Question: Which SAS/Stat proc to use

Dale — Fri, 18 Dec 2009 22:58:36 GMT

You don't provide all relevant information to answer your question.

For instance, are there any parameters in F(w,x,y,z) or g(w) which need to be estimated? Or are F(w,x,y,z) and g(w) specified a priori so that all you need to do is estimate a1, a2, a3, and a4?

Also, would it be prudent to include an intercept in your model so that you estimate

F(w,x,y,z) = a0 + (a1*g(w)) + (a2*x) + (a3*y) + (a4*z)

And too, can we assume that the residuals are normally distributed? I would think that would be a reasonable assumption, but you know what happens when you assume too much.

Those are a few questions which come to mind immediately. If there is anything else which is important to know, please share that as well.

Re: Analysis Question: Which SAS/Stat proc to use

deleted_user — Tue, 22 Dec 2009 13:18:54 GMT

1. All values of g(w) will be known a priori, and the F(w,x,y,z) will be known (for a very large number of specific {w, x, y, z} combinations) as they are the results of the "observations" we're making.

2. Yes, we should probably calculate the intercept value, a0, as well.

3. Yes, I have been assuming the residuals are normally distributed.

Thanks!

Re: Analysis Question: Which SAS/Stat proc to use

Dale — Wed, 30 Dec 2009 20:13:21 GMT

I don't know what you mean when you state that "the F(w,x,y,z) will be known ... as they are the results of 'observations' we're making." Is there a variable which you are collecting which represents F(w,x,y,z)? Your response is a little cryptic here.

If you have a variable RESULT and you know the parameters of the function g(w) (and can therefore construct in a data set a variable g_w=g(w)), then you could simply use PROC REG to obtain the least squares solution to the equation

RESULT = a0 + (a1 * g_w) + (a2 * x) + (a3 * y) + (a4 * z)

You would simply write:

proc reg data=mydata;
model RESULT = g_2 x y z;
run;

But I am not certain that this is what you are looking for since I still don't know how F(w,x,y,z) is obtained.

Re: Analysis Question: Which SAS/Stat proc to use

deleted_user — Wed, 30 Dec 2009 21:07:16 GMT

I think the problem is that I'm being sloppy in the way I'm using the function F. What we really known are the values of the observations based on a large number of specific combinations of parameters. Maybe this will help:

Let's assume I have N observations based on experiments performed with N sets of the four parameters. Let's say that for j = 1, 2...,N, O(j) is the value observed from experiment j, which had inputs of w(j), x(j), y(j), and z(j). What I'm really looking for is a formula, F(w,x,y,z), that lets me predict the outcome of an experiment run with a (potentially new) combination of the four parameters. I.e., given 4 new parameters, {w0, x0, y0, z0}, the (least squares) predicted value of the experiment would equal F(w0, x0, y0, z0).

I apologize for the confusion -- and hope this helps. Thanks.

Re: Analysis Question: Which SAS/Stat proc to use

Dale — Thu, 31 Dec 2009 17:19:04 GMT

So, F(w,x,y,z) is an observed value placed into the variable O. As previously stated, the REG procedure would be appropriate for constructing a predictor of O=F(w, x, y, z) which you could apply to F(w0, x0, y0, z0). With variables O, w, x, y, z, and g_w=g(w) in a data set named MYDATA, you can obtain parameters a0 through a4 of the equation

    O = a0 + (a1 * g_w) + (a2 * x) + (a3 * y) + (a4 * z)

employing the code

  proc reg data=mydata;
    model O = g_w x y z;
  run;

You could also use any of a number of other SAS procedures to obtain the same results: procs GLM, GENMOD, MIXED, GLIMMIX, ORTHOREG to name a few. The ORTHOREG procedure is of some special interest in that it works well with what are referred to as ill-conditioned data. Ill-conditioned data arises when there are very strong correlations among the predictor variables. But I presume that for your experimental setting where you are manipulating w, x, y, and z, poorly conditioned data should not really be a problem.