BookmarkSubscribeRSS Feed
deleted_user
Not applicable
We have a very large sample of observations of a process that is believed to depend on a four real variables (w, x, y, and z), some of which are correlated. The process is expected to be “close to” a known function of w, call it g(w), with fluctuations around g that are the effect of the other three variables.

Question: How can SAS/Stat be used to derive a predictive formula of the form of:

F(w,x,y,z) = (a1 * g(w)) + (a2 * x) + (a3 * y) + (a4 * z),

where the ai’s are the standard “least squares” coefficients based on the given sample?

Thanks in advance...
5 REPLIES 5
Dale
Pyrite | Level 9
You don't provide all relevant information to answer your question.

For instance, are there any parameters in F(w,x,y,z) or g(w) which need to be estimated? Or are F(w,x,y,z) and g(w) specified a priori so that all you need to do is estimate a1, a2, a3, and a4?

Also, would it be prudent to include an intercept in your model so that you estimate

F(w,x,y,z) = a0 + (a1*g(w)) + (a2*x) + (a3*y) + (a4*z)

And too, can we assume that the residuals are normally distributed? I would think that would be a reasonable assumption, but you know what happens when you assume too much.

Those are a few questions which come to mind immediately. If there is anything else which is important to know, please share that as well.
deleted_user
Not applicable
1. All values of g(w) will be known a priori, and the F(w,x,y,z) will be known (for a very large number of specific {w, x, y, z} combinations) as they are the results of the "observations" we're making.

2. Yes, we should probably calculate the intercept value, a0, as well.

3. Yes, I have been assuming the residuals are normally distributed.

Thanks!
Dale
Pyrite | Level 9
I don't know what you mean when you state that "the F(w,x,y,z) will be known ... as they are the results of 'observations' we're making." Is there a variable which you are collecting which represents F(w,x,y,z)? Your response is a little cryptic here.

If you have a variable RESULT and you know the parameters of the function g(w) (and can therefore construct in a data set a variable g_w=g(w)), then you could simply use PROC REG to obtain the least squares solution to the equation

RESULT = a0 + (a1 * g_w) + (a2 * x) + (a3 * y) + (a4 * z)

You would simply write:

proc reg data=mydata;
model RESULT = g_2 x y z;
run;

But I am not certain that this is what you are looking for since I still don't know how F(w,x,y,z) is obtained.
deleted_user
Not applicable
I think the problem is that I'm being sloppy in the way I'm using the function F. What we really known are the values of the observations based on a large number of specific combinations of parameters. Maybe this will help:

Let's assume I have N observations based on experiments performed with N sets of the four parameters. Let's say that for j = 1, 2...,N, O(j) is the value observed from experiment j, which had inputs of w(j), x(j), y(j), and z(j). What I'm really looking for is a formula, F(w,x,y,z), that lets me predict the outcome of an experiment run with a (potentially new) combination of the four parameters. I.e., given 4 new parameters, {w0, x0, y0, z0}, the (least squares) predicted value of the experiment would equal F(w0, x0, y0, z0).

I apologize for the confusion -- and hope this helps. Thanks.
Dale
Pyrite | Level 9
So, F(w,x,y,z) is an observed value placed into the variable O. As previously stated, the REG procedure would be appropriate for constructing a predictor of O=F(w, x, y, z) which you could apply to F(w0, x0, y0, z0). With variables O, w, x, y, z, and g_w=g(w) in a data set named MYDATA, you can obtain parameters a0 through a4 of the equation

    O = a0 + (a1 * g_w) + (a2 * x) + (a3 * y) + (a4 * z)

employing the code

  proc reg data=mydata;
    model O = g_w x y z;
  run;

You could also use any of a number of other SAS procedures to obtain the same results: procs GLM, GENMOD, MIXED, GLIMMIX, ORTHOREG to name a few. The ORTHOREG procedure is of some special interest in that it works well with what are referred to as ill-conditioned data. Ill-conditioned data arises when there are very strong correlations among the predictor variables. But I presume that for your experimental setting where you are manipulating w, x, y, and z, poorly conditioned data should not really be a problem.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1214 views
  • 0 likes
  • 2 in conversation