About igforek

igforek · ‎09-22-2019

FULL DATA output: Solutions for Fixed Effects Effect bact host Estimate Standard DF t Value Pr > |t| Error Intercept 0.766 0.2649 Infty 2.89 0.0038 bact bactC -17.0051 1.6642 Infty -10.22 <.0001 bact bactD -23.8316 0.6135 Infty -38.84 <.0001 bact bactB -36.9028 0.4498 Infty -82.05 <.0001 bact bactE -23.3663 0.6921 Infty -33.76 <.0001 bact bactA 0 . . . . host hostC -4.8168 0.7241 Infty -6.65 <.0001 host hostD -5.536 1.0131 Infty -5.46 <.0001 host hostB 0.1507 0.4835 Infty 0.31 0.7553 host hostE -1.6997 0.3898 Infty -4.36 <.0001 host hostA 0 . . . . bact*host bactC hostC 22.0099 1.9807 Infty 11.11 <.0001 bact*host bactC hostD 20.1344 1.9271 Infty 10.45 <.0001 bact*host bactC hostB 18.774 1.759 Infty 10.67 <.0001 bact*host bactC hostE 15.5281 1.6641 Infty 9.33 <.0001 bact*host bactC hostA 0 . . . . bact*host bactD hostC 25.5982 0.9925 Infty 25.79 <.0001 bact*host bactD hostD 26.0533 1.2313 Infty 21.16 <.0001 bact*host bactD hostB 20.505 . . . . bact*host bactD hostE -24.167 . . . . bact*host bactD hostA 0 . . . . bact*host bactB hostC -19.1977 . . . . bact*host bactB hostD -18.9751 . . . . bact*host bactB hostB 38.0947 0.7942 Infty 47.97 <.0001 bact*host bactB hostE 39.1647 . . . . bact*host bactB hostA 0 . . . . bact*host bactE hostC 25.388 1.1044 Infty 22.99 <.0001 bact*host bactE hostD 23.3649 1.5339 Infty 15.23 <.0001 bact*host bactE hostB 19.8738 . . . . bact*host bactE hostE -23.3408 . . . . bact*host bactE hostA 0 . . . . bact*host bactA hostC 0 . . . . bact*host bactA hostD 0 . . . . bact*host bactA hostB 0 . . . . bact*host bactA hostE 0 . . . . bact*host bactA hostA 0 . . . . Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F bact 4 Infty 1896.16 <.0001 host 4 Infty 917.93 <.0001 bact*host 9 Infty 580.38 <.0001 10% under-sampled zeros: Solutions for Fixed Effects Effect bact host Estimate Standard DF t Value Pr > |t| Error Intercept 2.9867 0.005671 Infty 526.63 <.0001 bact bactC -18.2666 60.8846 Infty -0.3 0.7642 bact bactD -17.4818 35.5283 Infty -0.49 0.6227 bact Mel 2.1529 0.448 Infty 4.81 <.0001 bact bactE -17.4058 35.8161 Infty -0.49 0.627 bact bactA 0 . . . . host hostC -3.782 0.7725 Infty -4.9 <.0001 host hostD -3.0104 0.4715 Infty -6.39 <.0001 host hostB 0.1206 0.9151 Infty 0.13 0.8952 host hostE -1.6249 0.6332 Infty -2.57 0.0103 host hostA 0 . . . . bact*host bactC hostC 22.2182 61.0542 Infty 0.36 0.7159 bact*host bactC hostD 19.5715 60.878 Infty 0.32 0.7478 bact*host bactC hostB 19.989 60.9134 Infty 0.33 0.7428 bact*host bactC hostE 17.58 60.8227 Infty 0.29 0.7726 bact*host bactC hostA 0 . . . . bact*host bactD hostC 19.0893 35.4891 Infty 0.54 0.5907 bact*host bactD hostD 17.8064 35.5036 Infty 0.5 0.616 bact*host bactD hostB 14.6293 35.9134 Infty 0.41 0.6838 bact*host bactD hostE 0 . . . . bact*host Mel hostC -50.1529 . . . . bact*host Mel hostB -1.0257 0.757 Infty -1.36 0.1754 bact*host Mel hostE 0 . . . . bact*host bactE hostC 19.3492 35.6662 Infty 0.54 0.5875 bact*host bactE hostD 16.6343 35.9108 Infty 0.46 0.6432 bact*host bactE hostB 14.5901 36.4178 Infty 0.4 0.6887 bact*host bactE hostE 0 . . . . bact*host bactA hostC 0 . . . . bact*host bactA hostD 0 . . . . bact*host bactA hostB 0 . . . . bact*host bactA hostE 0 . . . . bact*host bactA hostA 0 . . . . Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F bact 4 Infty 538.79 <.0001 host 4 Infty 114.29 <.0001 bact*host 11 Infty 16.9 <.0001

igforek · ‎09-22-2019

Esteemed Paige Miller, Thank you for your help. The lack of degrees of freedom to estimate std errors and p-values for some interactions is only one of the problems with the statistical analysis of my data in glimmix. There is also the problem of quasi-separation. I found this problem in my initial analysis, when I used proc logistic. Although the firth correction seems to be a solution for quasi-separation in PROC logistic, I cannot include random factors in that procedure. Quasi-separation can be solved in glimmix by: reducing the number of variables or effects in the model merging categories of categorical (CLASS) variables The truth is that those solutions will defeat the purpose of my experiment. Also, I noticed that with the under-sampled data, some of the interactions are lost (they are "sample out" due to the random nature of the under-sampling) . When I increase the % of under-sampling to 30% or 40%, the quasi-separation becomes evident (very large estimates and std errors for some estimates). I think I will switch to a different type of analysis. I was recently advised to use a "correspondence analysis". I will explore that venue. Thank you very much for your help. I really appreciate that you paid attention to my post. I did not get a solution to my problem, but your advise was very helpful. I do I do not know how to rate your input. Regards,

igforek · ‎09-21-2019

The full data run with the code: proc glimmix data=full method=quad empirical; class bact(ref='bactA') host(ref='hostA') TTO Bottle MotherSetNINE; model Posit/Trials = bact host bact*host / dist=binomial link=logit ddfm=none s; random int / subject=MotherSetNINE(Bottle TTO); run; Results attached in csv file

igforek · ‎09-21-2019

The model run with with under-sampled zeroes shows not significance (p-values) values for the interactions. See attached.

igforek · ‎09-21-2019

Does my under-sampling method (and the correction) seem ok? I am really in the dark here.

igforek · ‎09-21-2019

I think I was able to find a way to "under-sample" the large frequency of "zero" entries in my full data table. I am analyzing the data as event/trials, which is equal to "PropSucc" (i.e., proportion of success of the infection) in my full table. I did not find any "rule" to determine the desired level of under-sampling. But, observing the frequency of "PropSucc" in my full data, I "eyeballed" the desired "new" frequency of zeros to about 10%: I used the following codes for under-sampling the zeros: proc sort data = full_data; by PropSucc; run; proc surveyselect data = full_data out = sub_10 method = srs rate = (10,100,100,100,100,100,100,100,100,100,100) seed = 9876; strata PropSucc; run; The percentage of zeroes in the "FULL_DATA" and in the 10% sub-sample of zeros ("Sub_10") are shown in the figure below: The offset adjustment could be this one: data Sub_10_with_offset; set sub_10; off=log((0.1140/(1-0.1140))*((1-0.5511)/0.5511)); run; And the model with the offset: proc glimmix data=Sub_10_with_offset empirical=classical method=quad; class bact host TTO Bottle MotherSetNINE; model Posit/Trials = bact host bact*host / dist=binomial link=logit ddfm=none s offset=off; random int / subject=MotherSetNINE(Bottle TTO); run; One issue I observe is that the the intercept with the original full data is: 0.7660 and the under-sampled intercept with the offset adjustment is 2.9867. These are very different values. Perhaps my offset estimation is way off (incorrect estimation?). I will be very thankful for any help- comment you can provide.

igforek · ‎09-20-2019

Dear Paige Miller, I am very grateful you took the time to take a look at my post. Thank you very much for your clarification about pseudo-replication. Table 1 from this post is a subset of my whole data. The distribution of values (and zeros) for my whole data looks like this : SAS proc freq ouput:: I could use the oversampling technique you mention. When I run the analysis as shown in my GLIMMIX code above (before any oversampling), some interactions do not have std. errors , df, t-values, or p-values for the estimates. Also, the std. errors for one bacteria and the interactions it is involved in are very high (306.85) when compared to the other std. errors, that vary from ~0.3 to ~1.7. This seems to be due to a quasi-separation issue in my data. Is this the right interpretation to the std.errors output? Thank you very much for your time and your insights.

igforek · ‎09-20-2019

Dear SAS community experts, I need your advice on the issue of pseudo-replication. Below are the details. EXPERIMENT AND DATA: I conducted a study with the intention to assess what factors influence the transmission success of newly introduced bacteria. I also wanted to have a "ranking" of what bacteria (among the ones I used in my experiment) are able to infect the most (or least) new hosts. The experiment consisted on cross-introduction of bacteria into "new" hosts and recorded whether or not those hosts transmit the "new" infection to their progeny. Cross-introduction means that I used several hosts and their naturally occurring bacteria (host-bacteria pairs), and I reciprocally introduced the bacteria into each other's host. For example, if I have three host-bacteria pairs: 1) host A naturally infected by bacteria A, 2) host B naturally infected by bacteria B, 3) host C naturally infected by bacteria C, cross-introduction will comprise: Introducing the natural bacteria from host A into host B. and Introducing the natural bacteria from host B into host A. and Introducing the natural bacteria from host A into host C. and Introducing the natural bacteria from host C into host A. and Introducing the natural bacteria from host B into host C. and Introducing the natural bacteria from host C into host B. In my experiment, the infections were introduced into non-infected hosts (i.e., cured from their natural infections) via injections (see Fig. 1). The injections included a "self" infection set (i.e., bacteria introduced into their cured natural host). Fig. 1 For each "newly" infected host, 10 children were screened for the infection (see Fig. 1) Figure 2 shows the design of the experimental sampling and Table 1 shows the variables I have for the analysis. Table 1 (attached) is a sub-sample of my real data. Fig. 2 I need to explain some of the variables in Table 1. Variable TTO This is the bacteria type that was introduced into each host. BactA is TTO "1", bactB is TTO "2", and bactC TTO "3". Variable "Bottle" For each host species, the cured individuals receiving the bacterial infections were obtained from species-specific culture bottles. I had a culture bottle containing host A, another bottle containing host B, and another one containing host C. The bottles are labeled 1, 2 and 3. The bottle containing host A is labelled "1", the one containing host B is labelled "2" and the last one "3" (Fig. 2, Table 1). Variable "MotherSetTHREE" For each host species (contained in their respective bottles), three sets of three individuals were randomly chosen from the culture bottle and each set was given one of the three types of infection shown in Table 1. For example, for host A: 3 individuals were given the "self"-infection (e.g., host A was injected with bacteria A), the other 3 individuals were given the infection from bacteria B, and the remaining 3 individuals received the infection from bacteria C (Figure 2, Table 1). I made sure that each one of the individuals became infected. These were the mothers of the progenies that were screened for the infection (see below). In Table 1 these "MotherSetTHREE" are numbered 1 2 3 nine times (see also Fig. 2). I want to make sure that it is understood that each individual within each set of three are different from each other. Also, all sets of three shown in Table 1 are different from each other. 10 children from each one of the infected individual "mothers" in the experiment were screened for bacterial infection (a total of 30 per each set of three mothers, and 270 children for the whole experiment). The results from these screenings constitute the response ("Infection success"). MY ANALYSIS The analysis to figure out the factors that affect the success of the infection in the progeny of "new" hosts was run by using the PROC logistic. The model was: Event/Trals = host + bact + host*bact The analysis has proven difficult (many issues that are I will not address in this post). MY VIEW ON THE EXPERIMENT: This was an experimental study conducted in a laboratory. The experiment was conducted on a small population of hosts kept under laboratory conditions for several generations. The results will prompt studies using larger populations of hosts with more diverse genetic backgrounds. DOUBT: It came to my attention that the three individuals within each set I sampled (as defined in "MotherSetTHREE") could be considered as pseudo-replications within the set. In my opinion, that may not be the case. Each bottle represents my entire "cured population" for each host species. The cured individuals are entities that do not exist anywhere else in the world (i.e., the infections I created are not "natural"). They cannot be sampled from "multiple" populations, but from the one bottle I had in the lab. Similarly, the infections I introduced into the new hosts do not exist anywhere else in the world (to the best of my knowledge). In that sense, I thought that each individual sampled (for "MothrSetTHREE") constituted a real replication. However, I am open to see the pseudo-replication side of my experiment. QUESTIONS 1. How to appropriately enter random factors to account for the pseudo-replication in my experiment? My understanding is that the code below would address this issue. proc glimmix method=quad; class Host Bact TTO Bottle MotherSetTHREE; model Event/Trials = Bact + Host + Bact*Host / dist=binomial link-logit; random int / subject=MotherSetTHREE(Bottle TTO) run; Do you agree that the code will address the pseudo-replication issue? 2. I have many zeroes in my response. Someone suggested that I need to use an "overinflation factor" to account for that. I googled "overinflation factor" but nothing I find helpful came up. What is this factor and how can I implement it in my analysis? Thank you for your help.

igforek · ‎08-24-2019

Thank you for your comments. I am still working on solving the issues with the analysis. I will let all of you know how I goes.

igforek · ‎08-22-2019

To me personally, this look like a "Hindsight is 20/20" situation. Other researchers in my field used this type of analysis successfully. In my case, there are too many zeroes and some host do not even get infected by their own bacteria ("0" self-infection). Who knew?

igforek · ‎08-22-2019

Unfortunately, I am right mow working on a computer that does not run SAS or any other statistical software. I will try changing my reference this afternoon, after 5 pm central time. I am thinking case 3 may apply to my analysis. And yes, I am interested in estimates on the categorical variables

igforek · ‎08-22-2019

What do you think about Dr. Allison insight into dummy variables? https://statisticalhorizons.com/multicollinearity

igforek · ‎08-21-2019

Mr. Ksharp, I thought that unbalanced data meant that there were too many observations in one class, as compared to another class in an analysis. Are you suggesting that too many zeros in one class can be a special case of unbalanced data? Regards,

igforek · ‎08-19-2019

HostB is naturally infected by BactB HostC is naturally infected by BactC and so on BactE is naturally infected by BactE

igforek · ‎08-19-2019

I took a look at this page: https://support.sas.com/rnd/app/stat/examples/GENMODZIP/roots.htm but I am not sure if this will apply to my data, because I have categorical independent variables. And binomial.

Online Status	Offline
Date Last Visited	‎01-20-2021 09:48 AM

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? + ...

Re: How do I account for Pseudo-replication in logistic regression?

How do I account for Pseudo-replication in logistic regression?

Re: proc logistic with too many zeroes

Re: proc logistic with too many zeroes

Re: How do I account for Pseudo-replication in logostic regression

Re: How do I account for Pseudo-replication in logistic regression?

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? - ...

How to test the assumptions of proc MIXED?

How do I account for Pseudo-replication in logistic regression?

Re: SAS EG "Cant Connect to Local Server"

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? - ...

Re: How do I account for Pseudo-replication in logistic regression? + ...

Re: How do I account for Pseudo-replication in logistic regression?

How do I account for Pseudo-replication in logistic regression?

Re: proc logistic with too many zeroes

Re: proc logistic with too many zeroes

Re: proc logistic with too many zeroes

Re: proc logistic with too many zeroes

Re: proc logistic with too many zeroes

Re: proc logistic with too many zeroes

Re: proc logistic with too many zeroes