BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
marcel
Obsidian | Level 7

Hi Community members,

 

I hope you can help me to clarify some questions I have about an analysis I am performing right now. The experiment consists on finding out:

 

1) if genetic distance ("Genetic Dist Recipient", "Genetic Dist Virus")

 

2) and phylogeny  ("RecipientType", "virus type" ",  "Recipient GROUP", "Virus GROUP")

have an effect on the success of a virus infection into control (self and sister) and new hosts.

 

In the attached table, each row identified by an ID number consists of the results of injecting 10 individuals ("Trials" number) of the same "recipient type" with the same type of virus ("virus type"). The "Posit" column indicates how many individuals out of the 10 injected ones developed the infection. The "PropSucc01" column is the proportion Posit/Trials. I also converted the proportion of success to arcsine. For each column I also coded the success as 0=no infection, and 1=infection (column "Succ01").

The column "RepeatSetof10" indicates that I took 9 different sets, and  each set was made up of ten different individuals from the "RecipientType."  I injected with the virus so many individuals because it is known in other systems that some combinations of recipient-virus are extremely difficult to infect, resulting in the many zeroes on the "Succ01" column.

 

Analysis:

 

Other similar works used the binary logistic regression to estimate the success of infection. It made sense to me, and I tried to do the same; but there seems to be a quasi-separation of the 0s and 1s, so the maximum likelihood estimates may not exist or are not reliable.

-------------------------------------------------------------------------------------------------------

DISCLAIMER:

The attached table is a partial set of my data. The experiment was not conducted in humans.

-------------------------------------------------------------------------------------------------------

I tried GLIMMIX for two types of analysis:

 

1) Entering the data as Posit/Trials (distribution=binomial, link-logit)

 

2) Arcsine transformed proportion of success (distribution=binomial, link-logit)

 

For both types of analyses the residual plots show a non-normal distribution. Removing the outliers, the residuals plots get even worse. So, I think that my data et is not suitable to analysis by those methods. Attached are some residual graphs.

 

The questions are:

 

1) It is advisable to run it as a non-binomial analysis? The "Trial" sizes are equal for all the recipient-donor pairs I used ( a total of 90 individuals, from 9 trials of 10 individuals)?

2) Any other type of analyses you could advise?

 

Thank you very much for your help.

 

Marcel

 

Code

proc glimmix  data=CH_3_TTO_RETHINK order=data  ODDSRATIO

plots(OBSNO)=residualpanel(conditional marginal);

nloptions tech=nrridg maxiter=1000;

class RecipientType RecipGROUP VirusType VirusGROUP

model  Posit/Trials = VirusType

                      RecipientType

                      VirusType*RecipientType

                           

                      GeneticDistRecipient                     

                      GenetciDistVirus

                      GenetciDistVirus*GeneticDistRecipient

 

                      VirusGROUP

                      RecipGROUP

 

/dist=binomial link=logit;

random _residual_/subject=Repeat;

output out=results pred(ilink)=fit;

run;

 

 

proc glimmix  data=CH_3_TTO_RETHINK order=data  ODDSRATIO

plots(OBSNO)=residualpanel(conditional marginal);

nloptions tech=nrridg maxiter=1000;

class RecipientType RecipGROUP VirusType VirusGROUP

model  PropSuccArcSin = VirusType

                      RecipientType

                      VirusType*RecipientType

                           

                      GeneticDistRecipient                     

                      GenetciDistVirus

                      GenetciDistVirus*GeneticDistRecipient

 

                        VirusGROUP

                        RecipGROUP

 

/dist=binomial link=logit;

random _residual_/subject=Repeat;

output out=results pred(ilink)=fit;

run;

 

 

 

 Residual Plot Posit/TrialsResidual Plot Posit/TrialsResidual Plot PropSuccArcSinResidual Plot PropSuccArcSin

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

From your description, it seems like all of the subjects in the study can be considered independent, so you could fit the model in PROC LOGISTIC. Note that for logistic models, the residuals are not assumed to be normally distributed, so I don't see any reason to use the transformation. See this note. The Firth penalized likelihood method (FIRTH option in the MODEL statement in PROC LOGISTIC) can often avoid separation problems. But when this happens the data may have been made too sparse by the complexity of the model, so you might have to try working with simpler models. Note that one simplification might be to merge together some levels within a CLASS predictor. Increasing the number of iterations is rarely useful.

View solution in original post

2 REPLIES 2
StatDave
SAS Super FREQ

From your description, it seems like all of the subjects in the study can be considered independent, so you could fit the model in PROC LOGISTIC. Note that for logistic models, the residuals are not assumed to be normally distributed, so I don't see any reason to use the transformation. See this note. The Firth penalized likelihood method (FIRTH option in the MODEL statement in PROC LOGISTIC) can often avoid separation problems. But when this happens the data may have been made too sparse by the complexity of the model, so you might have to try working with simpler models. Note that one simplification might be to merge together some levels within a CLASS predictor. Increasing the number of iterations is rarely useful.

marcel
Obsidian | Level 7

Mr. StatDave,

 

Thank you very much for your answer. It clarified for me several aspects of the analysis. I will try your suggestions. It seems that keeping it simple is one of the safest ways to go.

 

Regards,

 

marcel

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1531 views
  • 1 like
  • 2 in conversation