I am trying to create an estimate statement for a nested term. See model below...
I have 7 families, 6 isolates, and 2 lineages. The 6 isolates are nested within lineages. Isolates 1, 2, and 3 are the NA1 lineage and Isolates 4, 5, and 6 are the EU1 Lineage.
However, Isolate is random and Lineage is fixed.
Proc mixed data = new covtest;
CLASS Family Isolate Lineage Tree;
Model Lesion = Lineage /ddfm=KR outp=dat;
random Family Isolate(Lineage) Family*Isolate(Lineage);
run;
These are the estimate statements that I am attempting to run. I am getting estimates, but they don't really make sense. Ones that have the smallest lesions have very large estimates, for example.
estimate "EU1 4" intercept 1 Lineage 1 0 | Isolate(Lineage) 0 0 0 1 0 0 /CL;
estimate "EU1 5" intercept 1 Lineage 1 0 | Isolate(Lineage) 0 0 0 0 1 0 /CL;
estimate "EU1 6" intercept 1 Lineage 1 0 | Isolate(Lineage) 0 0 0 0 0 1 /CL;
estimate "NA1 1" intercept 1 Lineage 0 1 | Isolate(Lineage) 1 0 0 0 0 0 /CL;
estimate "NA1 2" intercept 1 Lineage 0 1 | Isolate(Lineage) 0 1 0 0 0 0 /CL;
estimate "NA1 3" intercept 1 Lineage 0 1 | Isolate(Lineage) 0 0 1 0 0 0 /CL;
run;
I am also trying to get estimates for "Family" as well as the interactions term "Family*Isolate(Lineage)". I thought I did family right... see below. But I'm not sure these make sense either..
estimate "1" intercept 1 | Family 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ;
estimate "2" intercept 1 | Family 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ;
estimate "3" intercept 1 | Family 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ;
estimate "4A" intercept 1 | Family 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ;
estimate "4B" intercept 1 | Family 0 0 0 0 1 0 0 0 0 0 0 0 0 0 ;
estimate "4C" intercept 1 | Family 0 0 0 0 0 1 0 0 0 0 0 0 0 0 ;
estimate "4D" intercept 1 | Family 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ;
estimate "4E" intercept 1 | Family 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ;
estimate "4F" intercept 1 | Family 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ;
estimate "4G" intercept 1 | Family 0 0 0 0 0 0 0 0 0 1 0 0 0 0 ;
estimate "G5" intercept 1 | Family 0 0 0 0 0 0 0 0 0 0 1 0 0 0 ;
estimate "GR1" intercept 1 | Family 0 0 0 0 0 0 0 0 0 0 0 1 0 0 ;
estimate "GR2" intercept 1 | Family 0 0 0 0 0 0 0 0 0 0 0 0 1 0 ;
estimate "Gr3" intercept 1 | Family 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ;
run;
Any ideas on how the interaction term "Family*Isolate(Lineage)" should be set up? Any ideas about if these estimate statements are correct?
Your Isolate(Lineage) coefficients are in the wrong places.
Add the solution option to the random statement so that you can see the implicit order of Isolate(Lineage)
random Family Isolate(Lineage) Family*Isolate(Lineage) / solution;
estimate "EU1 4" intercept 1 Lineage 1 0 | Isolate(Lineage) 1 0 0 0 0 0 /CL;
estimate "EU1 5" intercept 1 Lineage 1 0 | Isolate(Lineage) 0 1 0 0 0 0 /CL;
estimate "EU1 6" intercept 1 Lineage 1 0 | Isolate(Lineage) 0 0 1 0 0 0 /CL;
estimate "NA1 1" intercept 1 Lineage 0 1 | Isolate(Lineage) 0 0 0 1 0 0 /CL;
estimate "NA1 2" intercept 1 Lineage 0 1 | Isolate(Lineage) 0 0 0 0 1 0 /CL;
estimate "NA1 3" intercept 1 Lineage 0 1 | Isolate(Lineage) 0 0 0 0 0 1 /CL;
I expect you'll get estimates closer to what you obtained with Excel, although not exactly the same: your dataset is unbalanced, and these are shrinkage estimations.
An excellent resource is Chapter 9 (Best Linear Unbiased Prediction) in the text by Walt Stroup:
Hope this gets you closer to where you want to be.
but they don't really make sense
Explain. We have no idea what you mean.
Also, I haven't gone through the math in my head, but the estimates you are requesting ... aren't these just the least squares means? What happens if you request the least squares means?
I mean that the averages that are calculated in excel don't even come close to the estimates that are given in SAS. Isolate 2 is supposed to have a very small average since all of my observations are very small, but the estimate that SAS gives me is very large.
LSmeans doesn't work for random effects. Is there another way to get least square means for random effects? This is the only other way I know of to get the "averages" for each term.
I have moved the term into the fixed effects, and tried LSmeans and those averages make more sense, but I can't really use that data since the term is a random effect.
LSMeans on Random effects? That's not kosher!
While my rabbi would not approve, there's nothing stopping you from running the model again by rewriting the model as a fixed effects model and removing the RANDOM statement so you can get LSMeans. But you did that already ... did you modify the model properly? If you really want LSMeans on RANDOM effects, I don't know why you would say "... but I can't really use that data since the term is a random effect."
You said:
I mean that the averages that are calculated in excel don't even come close to the estimates that are given in SAS
Explain. We have no idea what you mean.
You have to provide us with enough information so we can answer the underlying problem. Provide the numbers. Attach the raw data as a text file, with appropriate SAS code to read in the data so we can reproduce the problem. Don't just attach the raw data. We have to have SAS code to read it in. Do not attach Excel or other MS Office files.
And why should Excel's answer be the same as an LSMean? Excel doesn't compute LSMeans or even know about your statistical model.
I can't make my terms fixed effects if they are random! I chose my isolates randomly, so Isolate is a random effect. My model is not accurate if I just put terms wherever I want, instead of where they should go. I will not get accurate answers if I throw terms wherever I want them to go. Maybe I should just throw out my nested effect since that is what is confusing. I can't do that since Isolate is nested within Lineage.
I tried LSmeans just to see what I would get, but I had to move the term to be a fixed effect instead of random. But I can't use that data since those terms aren't actually fixed effects. LSmeans does not work on random effects. The model does not run. It does not work in SAS. Therefore, I can't use LSmeans to get estimates.
I am trying to explain that the estimates that I am getting while using the estimate statement do not seem like they are correct. FOR EXAMPLE, if all of my observations are between 1-5 and my estimate is 853, that doesn't really make sense. That is what I mean.
Attached is my data file.
This is the model I am running. I want to know if I am writing my estimate statement for the nested term correctly.
Proc mixed data = new covtest;
CLASS Family Isolate Lineage Tree;
Model Lesion = Lineage /ddfm=KR outp=dat;
random Family Isolate(Lineage) Family*Isolate(Lineage);
estimate "EU1 4" intercept 1 Lineage 1 0 | Isolate(Lineage) 0 0 0 1 0 0 /CL;
estimate "EU1 5" intercept 1 Lineage 1 0 | Isolate(Lineage) 0 0 0 0 1 0 /CL;
estimate "EU1 6" intercept 1 Lineage 1 0 | Isolate(Lineage) 0 0 0 0 0 1 /CL;
estimate "NA1 1" intercept 1 Lineage 0 1 | Isolate(Lineage) 1 0 0 0 0 0 /CL;
estimate "NA1 2" intercept 1 Lineage 0 1 | Isolate(Lineage) 0 1 0 0 0 0 /CL;
estimate "NA1 3" intercept 1 Lineage 0 1 | Isolate(Lineage) 0 0 1 0 0 0 /CL;
run;
This is the output I am getting.
SAS Output
4.8177 | 0.7590 | 4.11 | 6.35 | 0.0029 | 0.05 | 2.7329 | 6.9026 |
4.2300 | 0.7590 | 4.11 | 5.57 | 0.0047 | 0.05 | 2.1452 | 6.3149 |
6.3887 | 0.7590 | 4.11 | 8.42 | 0.0010 | 0.05 | 4.3039 | 8.4736 |
3.0041 | 0.7589 | 4.11 | 3.96 | 0.0158 | 0.05 | 0.9193 | 5.0889 |
3.9355 | 0.7590 | 4.11 | 5.19 | 0.0061 | 0.05 | 1.8507 | 6.0202 |
4.1254 | 0.7591 | 4.11 | 5.43 | 0.0051 | 0.05 | 2.0406 | 6.2102 |
I can't make my terms fixed effects if they are random! I chose my isolates randomly, so Isolate is a random effect. My model is not accurate if I just put terms wherever I want, instead of where they should go. I will not get accurate answers if I throw terms wherever I want them to go. Maybe I should just throw out my nested effect since that is what is confusing. I can't do that since Isolate is nested within Lineage.
Maybe you shouldn't be trying to get the equivalent of LSMeans on random effects. I think this is the real problem. Why do you want these linear combinations of effects anyway?
Attached is my data file.
This is the model I am running. I want to know if I am writing my estimate statement for the nested term correctly.
Please read what I said earlier about providing data.
"Maybe you shouldn't be trying to get the equivalent of LSMeans on random effects. I think this is the real problem. Why do you want these linear combinations of effects anyway?"
Really? I need the estimates in order to publish my research. I want to get the interaction to show there is a family by Isolate(lineage) interaction.. but I need to try get Isolate(lineage) to work first.
Attached is a text document of my data. I don't know how else to give you the SAS code.
Your Isolate(Lineage) coefficients are in the wrong places.
Add the solution option to the random statement so that you can see the implicit order of Isolate(Lineage)
random Family Isolate(Lineage) Family*Isolate(Lineage) / solution;
estimate "EU1 4" intercept 1 Lineage 1 0 | Isolate(Lineage) 1 0 0 0 0 0 /CL;
estimate "EU1 5" intercept 1 Lineage 1 0 | Isolate(Lineage) 0 1 0 0 0 0 /CL;
estimate "EU1 6" intercept 1 Lineage 1 0 | Isolate(Lineage) 0 0 1 0 0 0 /CL;
estimate "NA1 1" intercept 1 Lineage 0 1 | Isolate(Lineage) 0 0 0 1 0 0 /CL;
estimate "NA1 2" intercept 1 Lineage 0 1 | Isolate(Lineage) 0 0 0 0 1 0 /CL;
estimate "NA1 3" intercept 1 Lineage 0 1 | Isolate(Lineage) 0 0 0 0 0 1 /CL;
I expect you'll get estimates closer to what you obtained with Excel, although not exactly the same: your dataset is unbalanced, and these are shrinkage estimations.
An excellent resource is Chapter 9 (Best Linear Unbiased Prediction) in the text by Walt Stroup:
Hope this gets you closer to where you want to be.
For future reference, here is a process by which you can provide data to the community:
https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/
Most people dislike downloading MS Office files (e.g., Excel or Word) with unknown content. And a text file (e.g., .txt or .csv) of data does not include the information that might be needed to import it into SAS.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.