BookmarkSubscribeRSS Feed
Kastchei
Pyrite | Level 9

Hello,

 

This is more of a modeling question than a SAS question.  I have an experiment I need to analyze where there are two stratification variables: type of tissue and level of viral challenge.  The type of tissue is either ectocervical or endocervical.  The level of viral challenge is either 50 or 500.  There should be 4 combinations: ecto 50, ecto 500, endo 50, and endo 500.  The problem is that the lab didn't do one of the combination: endo 50.  Here's a table summary of what data exists.

 

  500 50
Ecto X X
Endo X -

 

In a model of outcome = tissue|virus, given that I am missing one type of data completely, I cannot test for the overall interaction, but I can get LSMeans for the three pairwise differences of the existing data.  I can somewhat assume there is an interaction if the adjusted pairwise differences show significance somewhere.

 

As it turns out, this model gives the same LSMeans results as if I just combined the two variables into one variable with 3 levels: ecto500, ecto050, and endo500.  The model would then just be outcome = tissueVirus.  The math to get to the results is slightly different, but the end results are exactly the same estimates, differences, variances, test statistics, and p-values.

 

The plot thickens, because these two factors are not actually the variables of clinical importance.  Instead, there are a whole host of other measurements that need to be tested.  This is an exploratory analysis.  For each measurement, I'll want to stratify by tissue and virus level.  The model would either look like 1) outcome = measurement|tissue|virus or 2) outcome = measurement|tissueVirus.

 

My question is: should I use 1 or 2?  2 seems simpler, has fewer things to estimate, and doesn't display a ton of non-estimatable results.  However, 1 is actually closer to the conceptual design of the experiment.  It seems, though, that because of the missing level, that perhaps there is no conceptual difference between 2x2 missing a cell and 1x3.

 

Thanks in advance!

Michael

4 REPLIES 4
plf515
Lapis Lazuli | Level 10

I would do 2).   This seems a lot simpler.  The original intent was to do one thing, but it didn't work out that way (so often the case) so iI think you have to adjust.

SteveDenham
Jade | Level 19

In Milliken and Johnson's classic text Analysis of Messy Data, they cover this type of thing extensively.  Rather than recoding to a single variable, they analyze a "means model", which in this case would look like:

 

model dependent_var=tissue*virus;

 

Note that there are no main effects in this--it is a one way ANOVA, so any comparisons have to be done with ESTIMATE or (even better) LSMESTIMATE statements.  This has the advantages of retaining the original design and not having to write code to convert two variables to one in a DATA step

 

Steve Denham

plf515
Lapis Lazuli | Level 10

Hi Steve

This sounds like it works out the same as doing the combination in the DATA step.....That' smy intuition anyway.  Do you know if my intuition is right?

 

Peter

SteveDenham
Jade | Level 19

@plf515,

 

Your intuition is right on the nose.  The advantage to the means model approach is that I don't have to come up with code and level labels, which is big for me because I am really lazy.

 

I think means model approaches were touted by Milliken and Johnson because the Type IV hypotheses (designed for missing cell analyses) in PROC GLM were not unique, whereas the means model resulted in unique hypothesis tests.  Now that the LSMESTIMATE statement is available, I can see a lot more analyses going this route.

 

Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1698 views
  • 3 likes
  • 3 in conversation