Hi everyone,
I'm trying to code interactions between genotype and a metabolite. There are 4 genotypes.
When I code the model as:
CLASS gene;
MODEL outcome=gene metabolite gene*metabolite;
I get a different output than if I code the model as:
CLASS gen;
MODEL outcome=gene metabolite gene|metabolite;
When I use the | I am able to see that one of my genotypes is a reference group versus with the *, I get estimates for all 4 genotypes.
What is the difference between * and | and which model would be correct????
The second model uses variables GEN and GENE. Was that intentional?
To answer your question, the notation A | X (where A is a CLASS variable) is equivalent to the three terms
A X A*X
That is, A|X includes main effects and the interaction.
If you repeat a main effect, it will be dropped. Consequently, the following models are equivalent:
MODEL outcome=gene | metabolite;
MODEL outcome=gene metabolite gene*metabolite;
MODEL outcome=gene metabolite gene | metabolite;
Of these, the first is simplest. The third is confusing and should be avoided.
The second model uses variables GEN and GENE. Was that intentional?
To answer your question, the notation A | X (where A is a CLASS variable) is equivalent to the three terms
A X A*X
That is, A|X includes main effects and the interaction.
If you repeat a main effect, it will be dropped. Consequently, the following models are equivalent:
MODEL outcome=gene | metabolite;
MODEL outcome=gene metabolite gene*metabolite;
MODEL outcome=gene metabolite gene | metabolite;
Of these, the first is simplest. The third is confusing and should be avoided.
Do you have a small typo in your second CLASS statement? GenE?
@lalaktgrau wrote:
Hi everyone,
I'm trying to code interactions between genotype and a metabolite. There are 4 genotypes.
When I code the model as:
CLASS gene;
MODEL outcome=gene metabolite gene*metabolite;
I get a different output than if I code the model as:
CLASS gen;
MODEL outcome=gen metabolite gene|metabolite;
When I use the | I am able to see that one of my genotypes is a reference group versus with the *, I get estimates for all 4 genotypes.
What is the difference between * and | and which model would be correct????
Here's an example of specifying two models that are the same, with the same results.
title 'Method1';
proc glm data=sashelp.heart;
class sex;
model ageatstart = height weight sex weight*sex;
run;
quit;
title 'Method2';
proc glm data=sashelp.heart;
class sex;
model ageatstart =height weight|sex @2;
run;quit;
What does the @2 do?
@Specifies a 2 way interaction. If you had listed three variables separated by | with @2 it would do all two way interactions.
model ageAtStart = weight | height | diastolic @2;
Should be the same as (assuming I wrote it out correctly).
model ageAtStart = weight height diastolic weight*height weight*diastolic height*diastolic;
@lalaktgrau wrote:
@What does the @2 do?
In general The | does a factorial model in short hand: A|B is the equivalent of : A B A*B
model Y = A|B|C; is the same as
model Y = A B C A*B A*C B*C A*B*C;
Since you did not include which actual proc you are running or any other options specific output differences are hard to determine.
Thought your example appears to have GENE spelled as GEN. If there are actually two different variables in your set GENE and GEN then that likely accounts for most of the difference.
I corrected the Gene typo. That wasn't an error in my actual code.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.