Dear SAS users
I am trying to fit the model: Y= a + b*X1 + c*X2 + d*X3 + e*X1*X2 + f*X1*X3 + g*X2*X3
using SAS version 9.4 with
- Y the response (= estimate with standard error)
- X1 and X3 categorical variables
- X2 continuous variable
To specify the categorical variables, I included the CLASS statement before the model statement.
The response variables Y are estimates with a standard error (resulting from other models). How can I include the standard errors of the response variables in this model so that the more precisely estimates of Y (small standard errors) are more important for fitting the model than the Y values with bigger standard errors?
This is what I have so far:
data trial;
input X1 X2 X3 Y stder;
datalines;
V 7 K 72.03 0.67
V 9 L 45.12 0.88
V 10 M 80.27 1.05
V 19 K 53.32 0.73
V 19 L 22.52 0.91
V 20 M 80.63 0.73
V 20 N 45.39 1.22
V 29 K 26.05 0.59
V 31 L 53.36 1.12
V 31 M 43.99 0.86
V 32 N 69.06 1.26
M 5 K 53.59 0.78
M 7 L 44.73 0.72
M 9 M 76.36 0.82
M 18 K 80.68 0.50
M 19 L 70.96 0.67
M 20 M 44.62 0.99
M 20 N 76.6 1.21
M 28 K 53.08 0.51
M 29 L 82.71 0.62
M 29 M 79.75 0.89
M 30 N 44.3 1.62
T 7 K 65.61 1.45
T 9 L 76.42 1.15
T 11 M 68.34 1.36
T 22 K 55.34 1.15
T 23 L 62.48 0.95
T 23 M 53.29 0.94
T 23 N 66.9 1.40
T 30 K 74.94 0.91
T 31 L 66.4 0.93
T 31 M 49.77 1.11
T 32 N 86.76 1.27;
Proc print data=trial;
run;
proc glm data=trial;
class X1 X3;
model Y=a+b*X1 + c*X2 + d*X3 + e*X1*X2 + f*X1*X3 + g*X2*X3;
output out=trialresults p=pk ;
proc print data=trialresults;
run;
How do I include the WEIGHT statement?
Thank you for your advice!
@lgslm wrote:Dear SAS users
I am trying to fit the model: Y= a + b*X1 + c*X2 + d*X3 + e*X1*X2 + f*X1*X3 + g*X2*X3
using SAS version 9.4 with
- Y the response (= estimate with standard error)
- X1 and X3 categorical variables
- X2 continuous variable
To specify the categorical variables, I included the CLASS statement before the model statement.
The response variables Y are estimates with a standard error (resulting from other models). How can I include the standard errors of the response variables in this model so that the more precisely estimates of Y (small standard errors) are more important for fitting the model than the Y values with bigger standard errors?
This is what I have so far:
data trial;
input X1 X2 X3 Y stder;
datalines;
V 7 K 72.03 0.67
V 9 L 45.12 0.88
V 10 M 80.27 1.05
V 19 K 53.32 0.73
V 19 L 22.52 0.91
V 20 M 80.63 0.73
V 20 N 45.39 1.22
V 29 K 26.05 0.59
V 31 L 53.36 1.12
V 31 M 43.99 0.86
V 32 N 69.06 1.26
M 5 K 53.59 0.78
M 7 L 44.73 0.72
M 9 M 76.36 0.82
M 18 K 80.68 0.50
M 19 L 70.96 0.67
M 20 M 44.62 0.99
M 20 N 76.6 1.21
M 28 K 53.08 0.51
M 29 L 82.71 0.62
M 29 M 79.75 0.89
M 30 N 44.3 1.62
T 7 K 65.61 1.45
T 9 L 76.42 1.15
T 11 M 68.34 1.36
T 22 K 55.34 1.15
T 23 L 62.48 0.95
T 23 M 53.29 0.94
T 23 N 66.9 1.40
T 30 K 74.94 0.91
T 31 L 66.4 0.93
T 31 M 49.77 1.11
T 32 N 86.76 1.27;
Proc print data=trial;
run;
proc glm data=trial;
class X1 X3;
model Y=a+b*X1 + c*X2 + d*X3 + e*X1*X2 + f*X1*X3 + g*X2*X3;
output out=trialresults p=pk ;proc print data=trialresults;
run;
How do I include the WEIGHT statement?
Thank you for your advice!
Before asking how to do something like this in SAS, state what type of model-fitting approach you are using and then ask if that model-fitting approach is supported by SAS. I have no idea from your question what model-fitting approach you are using. Is there a name for the model-fitting approach you describe? I mean, you could create a "weight" for each record defined as 1/(standard error of Y). If standard errors > 1 then weights will be smaller for Y values with large standard errors and larger for Y values with smaller standard errors. If a handful of standard errors can fall below 1 you could just set their weights to 1. You could then use a weight statement with this "weight" and you observations with large weights would have a larger influence on the resulting model parameter estimates.
The issue with this is that this is ad hoc. You should try and identify a model-fitting methodology that is anchored to to some theory/approach and then see if SAS can implement that methodology.
Thank for your response!
What do you exactly mean with 'model-fitting approach'? I want to find a model that is fitting the data, we suppose it's linear with interaction effects.
In the meanwhile, I adjusted the 'stder' in the dataset to 'w' and I changed the values to 1/stderror.
This is what I have so far:
data trial;
input x1$ x2$ x3$ y w;
datalines;
...
...
...
;
proc glm data=trial;
class x1 x3;
weight w;
model y=x1 x2 x3 x1*x2 x1*x3 x2*x3 / solution;
output out=trialresults p=pk ;
proc print data=trialresults;
run;
Now SAS is giving me this error, although all the values of x2 are numeric:
ERROR: Variable x2 should be either numeric or specified in the CLASS statement.
Do I also need to state that x2 is numeric?
It's also giving me this error, although I stated a model (or at least that's what I think):
ERROR: A MODEL statement must be given.
What could be wrong?
proc glm data=trial;
class x1 x3;
weight w;
model y=x1 x2 x3 x1*x2 x1*x3 x2*x3 / solution;
output out=trialresults p=pk ;proc print data=trialresults;
run;
Now SAS is giving me this error, although all the values of x2 are numeric:
ERROR: Variable x2 should be either numeric or specified in the CLASS statement.
Do I also need to state that x2 is numeric?
It's also giving me this error, although I stated a model (or at least that's what I think):
ERROR: A MODEL statement must be given.
CLASS2 is not numeric. It doesn't matter that the values of CLASS2 appear to be numeric, the variable is defined as character and so SAS can only treat it as character. If you want it to be treated as numeric, you need to create a new variable, let's say CLASS2A, which has the same values but is defined as numeric.
@lgslm wrote:Thank for your response!
What do you exactly mean with 'model-fitting approach'? I want to find a model that is fitting the data, we suppose it's linear with interaction effects.
In the meanwhile, I adjusted the 'stder' in the dataset to 'w' and I changed the values to 1/stderror.
This is what I have so far:
data trial;
input x1$ x2$ x3$ y w;
datalines;
......
...
;
proc glm data=trial;
class x1 x3;
weight w;
model y=x1 x2 x3 x1*x2 x1*x3 x2*x3 / solution;
output out=trialresults p=pk ;proc print data=trialresults;
run;
Now SAS is giving me this error, although all the values of x2 are numeric:
ERROR: Variable x2 should be either numeric or specified in the CLASS statement.
Do I also need to state that x2 is numeric?
It's also giving me this error, although I stated a model (or at least that's what I think):
ERROR: A MODEL statement must be given.
What could be wrong?
Remove the $ sign from X2 in your input statement so SAS will treat X2 as continuous in your proc glm.
"What do you exactly mean with 'model-fitting approach'? I want to find a model that is fitting the data, we suppose it's linear with interaction effects."
Well, what I suggested is known as inverse weighting (except I suggest using standard errors but you could also use the variances.) One issue is that SAS will treat the weights as fixed, as if they were measured with no error. Is that appropriate for your case? I don't know. I certainly think that the variances of your regression parameters will be underestimated because SAS won't capture the fact that your weights aren't fixed. You could appeal to authority or context; if you are trying to publish in a particular field and this sort of approach is used all the time then you can just go with it. If you are coming up with this approach on your own, you should have some understanding of the ramifications and justification for this approach other than using an ad hoc method that does reduce the influence of observations with large standard errors on parameter estimates. What about the variances of those parameter estimates? Can you cite a paper or other reference for your approach? Even an example that SAS uses in its manuals would be something.
I assume the weights are fixed since they are 1/(standard error of a parameter estimate) and these estimates are not changing?
How can I get variances of estimated model parameters using SAS? What will be the difference in using 1/se or 1/var as weighting values?
I didn't find an example yet on this approach, I guess that's why I'm struggling a bit with it. But for sure we think it's the best way to model our data.
@lgslm wrote:I assume the weights are fixed since they are 1/(standard error of a parameter estimate) and these estimates are not changing?
How can I get variances of estimated model parameters using SAS? What will be the difference in using 1/se or 1/var as weighting values?
I didn't find an example yet on this approach, I guess that's why I'm struggling a bit with it. But for sure we think it's the best way to model our data.
You assume the weights are fixed but they aren't. How do you know that's ok? Do you understand the ramifications?
SAS will output standard errors of your estimated parameter estimates; you can just square them to get their variances.
Using 1/var will increase the influence of observations with standard errors below 1 and, relatively, decrease the influence of observations with standards errors above 1.
For example, if you have two observations with standard errors of .5 and 2 then 1/se is 2 and .5, respectively. The first observation has a weight that is 2/.5= 4 times higher than the second. If you use their variances, .25 and 4, respectively, then 1/var is 4 and .25, respectively. Using these weights, observation 1 has a weight that is 4/.25=16 (4 squared) times higher than the second.
I understand that it's better to use the inverse of the variance as the weight. Thank you for the clear explanation!
But I'm not following with the weight being fixed or not? What are the ramifications for the model and how to write this in a SAS script?
Are you conducting a meta-analysis based on the results of other studies? If so, we can suggest several papers to look at (or you can Google). Many researchers use PROC MIXED for a meta-analysis for previous studies, but it might depend on the goals of your analysis.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.