Dear SAS community experts,
I need your advice on the issue of pseudo-replication. Below are the details.
EXPERIMENT AND DATA:
I conducted a study with the intention to assess what factors influence the transmission success of newly introduced bacteria. I also wanted to have a "ranking" of what bacteria (among the ones I used in my experiment) are able to infect the most (or least) new hosts.
The experiment consisted on cross-introduction of bacteria into "new" hosts and recorded whether or not those hosts transmit the "new" infection to their progeny.
Cross-introduction means that I used several hosts and their naturally occurring bacteria (host-bacteria pairs), and I reciprocally introduced the bacteria into each other's host.
For example, if I have three host-bacteria pairs: 1) host A naturally infected by bacteria A, 2) host B naturally infected by bacteria B, 3) host C naturally infected by bacteria C, cross-introduction will comprise:
Introducing the natural bacteria from host A into host B. and Introducing the natural bacteria from host B into host A. and Introducing the natural bacteria from host A into host C. and Introducing the natural bacteria from host C into host A. and Introducing the natural bacteria from host B into host C. and Introducing the natural bacteria from host C into host B.
In my experiment, the infections were introduced into non-infected hosts (i.e., cured from their natural infections) via injections (see Fig. 1). The injections included a "self" infection set (i.e., bacteria introduced into their cured natural host).
Fig. 1
For each "newly" infected host, 10 children were screened for the infection (see Fig. 1)
Figure 2 shows the design of the experimental sampling and Table 1 shows the variables I have for the analysis. Table 1 (attached) is a sub-sample of my real data.
Fig. 2
I need to explain some of the variables in Table 1.
Variable TTO
This is the bacteria type that was introduced into each host. BactA is TTO "1", bactB is TTO "2", and bactC TTO "3".
Variable "Bottle"
For each host species, the cured individuals receiving the bacterial infections were obtained from species-specific culture bottles. I had a culture bottle containing host A, another bottle containing host B, and another one containing host C.
The bottles are labeled 1, 2 and 3. The bottle containing host A is labelled "1", the one containing host B is labelled "2" and the last one "3" (Fig. 2, Table 1).
Variable "MotherSetTHREE"
For each host species (contained in their respective bottles), three sets of three individuals were randomly chosen from the culture bottle and each set was given one of the three types of infection shown in Table 1. For example, for host A: 3 individuals were given the "self"-infection (e.g., host A was injected with bacteria A), the other 3 individuals were given the infection from bacteria B, and the remaining 3 individuals received the infection from bacteria C (Figure 2, Table 1). I made sure that each one of the individuals became infected. These were the mothers of the progenies that were screened for the infection (see below).
In Table 1 these "MotherSetTHREE" are numbered 1 2 3 nine times (see also Fig. 2). I want to make sure that it is understood that each individual within each set of three are different from each other. Also, all sets of three shown in Table 1 are different from each other.
10 children from each one of the infected individual "mothers" in the experiment were screened for bacterial infection (a total of 30 per each set of three mothers, and 270 children for the whole experiment). The results from these screenings constitute the response ("Infection success").
MY ANALYSIS
The analysis to figure out the factors that affect the success of the infection in the progeny of "new" hosts was run by using the PROC logistic.
The model was:
Event/Trals = host + bact + host*bact
The analysis has proven difficult (many issues that are I will not address in this post).
MY VIEW ON THE EXPERIMENT:
This was an experimental study conducted in a laboratory. The experiment was conducted on a small population of hosts kept under laboratory conditions for several generations.
The results will prompt studies using larger populations of hosts with more diverse genetic backgrounds.
DOUBT:
It came to my attention that the three individuals within each set I sampled (as defined in "MotherSetTHREE") could be considered as pseudo-replications within the set.
In my opinion, that may not be the case. Each bottle represents my entire "cured population" for each host species. The cured individuals are entities that do not exist anywhere else in the world (i.e., the infections I created are not "natural"). They cannot be sampled from "multiple" populations, but from the one bottle I had in the lab. Similarly, the infections I introduced into the new hosts do not exist anywhere else in the world (to the best of my knowledge). In that sense, I thought that each individual sampled (for "MothrSetTHREE") constituted a real replication.
However, I am open to see the pseudo-replication side of my experiment.
QUESTIONS
1. How to appropriately enter random factors to account for the pseudo-replication in my experiment?
My understanding is that the code below would address this issue.
proc glimmix method=quad; class Host Bact TTO Bottle MotherSetTHREE; model Event/Trials = Bact + Host + Bact*Host / dist=binomial link-logit; random int / subject=MotherSetTHREE(Bottle TTO) run;
Do you agree that the code will address the pseudo-replication issue?
2. I have many zeroes in my response. Someone suggested that I need to use an "overinflation factor" to account for that. I googled "overinflation factor" but nothing I find helpful came up. What is this factor and how can I implement it in my analysis?
Thank you for your help.
... View more