About DanielJohansson

DanielJohansson · ‎07-06-2016

Incidentally, this is the code I ran. Spray = Treatment. proc mixed data=test; class AOI Spray View Sex StimulusIdentity Subject Trial; model RatioOfTotalDwellTime=AOI Spray View AOI*Spray AOI*Spray*View Sex StimulusIdentity; random int / subject=subject; repeated AOI / subject=Subject*Trial type=cs; lsmeans AOI*Spray / diff; where aoi < 4; run; Regards, Daniel

DanielJohansson · ‎07-06-2016

Please forgive my belated thank you - personal and work circumstances and the birth of a daughter got in between. But thank you very much for your answer - you are the first one to offer me some hope with regards to this, for which I am very grateful. I do have some follow-up questions, mostly because I essentially need to debunk my colleague, who has judged the whole project as improbable. The reason our discussion started was basically this: if I ran the full model, with all AOIs (and it does seem to me that the AOIs summing up to 1 for every single trial is the biggest problem), the significances would be very strong, but the G matrix would not be positive definite. Now, there are 4 AOIs, with proportion of looking time to each AOI as the measure, and they all sum up to 1 for every trial (=photo shown). I could solve the G matrix problem by simply including only the two most important AOIs, and then the model would converge, but of course those two AOIs are still highly correlated (such that if one increases, the other decreases in proportion of looking time). The overall effects of AOI*Treatment would be significant, and so would the pairwise comparisons under "Differences of Least Square Means", for the effect of spray for both of the AOIs (but in opposite directions, of course - hence the interaction effect). The problem is that when I run the model including only ONE AOI, it's no longer significant. This lead my colleague to point out that when more than one AOI is included, the covariance parameter estimates look like this: Covariance Parameter Estimates Cov Parm Subject Estimate Intercept Subject 0.001360 Residual 0.06204 When instead only one AOI is put into the model, they look like this: Covariance Parameter Estimates Cov Parm Subject Estimate Intercept Subject 0.04355 Residual 0.03510 He pointed out that the intercept estimate is much smaller when two or more AOIs are included, which seems to tell him that since the AOIs are so correlated, and in opposite directions, the estimate from subjects is negligible, and all the other effects are inflated. Do you have any input on this? Running the model with the R sided approach as we previously discussed, I can get the model to converge without complaining with as many as 3 of the AOIs, but the covariance estimates remain very low: With 2 AOIs: Covariance Parameter Estimates Cov Parm Subject Estimate Intercept Subject 0.002686 CS Subject*Trial -0.04779 Residual 0.1088 With 3 AOIs: Covariance Parameter Estimates Cov Parm Subject Estimate Intercept Subject 0.000020 CS Subject*Trial -0.02596 Residual 0.07900 You stated that the R sided approach might actually address the problem. Since I can get the model to converge with that (csh did not work as well, however), my main question now is: is my colleague correct? Is this a doomed project, given the above, or is there some explanation for why this approach would be valid, even though focusing on only one AOI yields no significant results? I hope I have provided enough information. Happy to clarify or elaborate. Kind regards, Daniel

DanielJohansson · ‎02-26-2016

I am replying to my own message to narrow down the problem somewhat. What I am most concerned about is the correlation between different AOIs, specifically the fact that the AOIs (there are 4) within a single trial will always be negatively correlated and will always sum to 1. This may or may not make Participant as a random effect useless, since I have been told that this might make the model think that the the random participant variation has no effect, and thus attribute too much effect to the fixed effects (for instance Treatment*AOI, which is the most interesting). The covariance parameter for Participant is also 0, and the G matrix is not positive definite. I have since tried to include this instead of a random statement: repeated AOI / subject=Participant*Trial; This yield the same p-values as the random statement in the above post, without error messages about the G matrix, and with covariance parameters above 0, but I am not at all sure it's correct. Trying another way to get at the AOI correlations, i tried: repeated trial / subject=Participant group=AOI; This yielded covariance estimate of Participant greater than 0, no error messages appear, and the results are more, shall we say, trustworthy (the p-values are not just 1 or really really really small). However, this is without specifying a covariance matrix (hence, I assume it uses the default vc). Changing the matrix to CS kills the p-values, and I'm afraid that the model is not correct when I just use the default matrix - despite the fact that there are no error messages. Anyone?

DanielJohansson · ‎02-19-2016

Hello, I am still rather stomping around on the beginner level with SAS (and really statistics), so I am truly grateful for any possibly oversimplification of any answers you would give me. I have a population of eye-tracking participants, who have viewed faces under the influence of a particular treatment or placebo. This is then analyzed through a mixed model. In short, the study includes around 80 subjects (male and female), that have been given treatment or placebo. They have then viewed pictures, that vary on two different attributes: either category (there are 5, more or less related), or view (what angle the picture was taken at - there are 3). Each subject views up to 38 pictures, but some view fewer (ranging from 1 to 38). The outcome variable then becomes the ratio of the presentation of each stimulus that is spent looking at one of 4 areas of interest (AOI). This is all presented in the long format, with every subject appearing on multiple lines - one line representing 1) the subject, 2) the specific picture (of the 38), 3) a specific AOI for that particular picture. Each subject thus has somewhere between 4 and 148 lines, where 4 lines means that only one stimulus picture was ever viewed by that particular subject (4 lines because there is an individual data point for each AOI). I didn't do the first mixed model (performed in SPSS with GENLINMIXED), but here's the code I am trying to use to replicate their analysis: proc mixed data=test; class AOI Treatment View Sex PictureCategory Participant; model RatioOfTotalDwellTime=Treatment AOI Sex View PictureCategory AOI*Treatment AOI*PictureCategory AOI*View AOI*Treatment*PictureCategory AOI*Treatment*View AOI*PictureCategory*View; random int / subject=Participant type=un; lsmeans AOI*Spray / diff; run; This model renders several significant results, including AOI*Treatment (subjects under treatment tend to focus on a specific part of the picture). Now for my questions: 1) The data is, as it stands, not normally distributed, and has clumping at zero. Firstly - is PROC MIXED ok to use with that distribution? I remember reading that it's the distribution of residuals that matter - how can I check that? Secondly, wouldn't the number of data points (around 8000) basically mean that the non-normal data isn't that much of a problem (given the central limit theorem)? 2) A random intercept with SUBJECT=Participant has been added. Two questions regarding this: Firstly, does this add Participant as a random effect (just like RANDOM Participant would?). The way I understand it, adding Participant as a random effect would account for the correlation within the participants. Is this correct? 3) A more intricate problem is this: While the correlations within each participant can to some extent be accounted for (with the random statement above), there are (as I see it) probably a LOT of other correlated data in this dataset: because, for instance, a participant that tends to focus on one AOI in one picture, will probably focus on the same AOI in the other pictures. The same goes for View. This means even when I tell the program that there are only 80-ish participants (as in the code above), every single participant will be represented with up to 38 different datapoints for each of the AOIs, and those datapoints (within the subject) should to an extent be correlated - even though (if I understand this correctly) I am "pretending" that they are independent observations. If anyone has anything wise to say on that matter (regarding if I'm overthinking it, or if there is another way of writing the code or treating the data, etc) I would be very grateful. As a side note on this, if I run the analysis by including only measurements for the AOI I am interested in, and for only the most salient view angle, the results are significant until I include the random statement defining the participants as subjects - then the significances disappear. I know that was a lot of questions and information, but I am crossing my fingers. Thank you for your patience. Kind regards, Daniel

DanielJohansson · ‎07-31-2013

Hello again, I wanted to tell you that HPMIXED so far has worked brilliantly, and to thank you for your amazing help. Still having problems with GLIMMIX, but I figure I will try to bash my head against the proverbial wall with that for a while, before turning to the forums again. Again, I am greatly in debt to you! Kind regards, Daniel Johansson

DanielJohansson · ‎07-30-2013

Hello again, Thank you again for your speedy replies - I am impressed. I have both good and bad updates, but having fiddled with this and read something from the SAS Global Forum, I think we may be on to something. The GLIMMIX still doesn't work, but I am thinking it's a question of simple computational power, so I will leave that on ice for the moment. I tried out the HPMIXED procedure, trying to write some code like the all-thumbs-carpenter I am (apart from mixed modelling, I also appreciate metaphors). The good news is, all the analyses seem to work like a charm with this procedure - what I would like to ask is the following: 1) Is my code correctly compiled? At the moment, it looks like this: proc hpmixed data=autism; class zygosity pairnr tvab genotype; model autismscore=genotype / ddfm=residual; repeated tvad / subject=pairnr group=zygosity type=un; test genotype; (a where statement for exclusion purposes) lsmeans genotype / diff; (title) run; I hope that my previous explanations have sufficiently explained what I want to do, and any input on whether the code is correct or not would be appreciated. 2) On the SAS webpage, it says that HPMIXED is expreimental. Simply put: how much should I care about that? Does it mean that I should exercise more caution with the results, and in that case how? 3) Having perused the web, I happened upon a paper from the SAS Global Forum, called "Tips and Strategies for Mixed Modelling" (http://support.sas.com/resources/papers/proceedings12/332-2012.pdf). I won't say that I understand everything, but one suggestion in that paper is to run HPMIXED to estimate covariance parameters, and the run MIXED with these parameters specified through a PARMS statement. My question is two-fold: Is this a viable method for our analyses, that is, will those results be acceptable to the scientific community (regarding only the statistical method, no other considerations)? In short: is it correct to do so, and is it "better" than just sticking with the HPMIXED results? Is this something that may be applied to GLIMMIX, perhaps helping to solve the convergence problems we have had? Again, I am much obliged. Kind regards, Daniel Johansson

DanielJohansson · ‎07-29-2013

Hi Steve, I am very grateful for your answer, and have now tried out your suggestions. Unfortunately, not with any great luck. The infinite likelihood problem is still present. I renamed the variables in my attempt at describing the data, so FamilyID is actually PairNR, that is a number that is unique for every twin pair - and there are, as far as I can find, no duplicates - one number occurs only once or twice (depending on whether the co-Twin is present or not). Furthermore, for some analyses I can get them to work by excluding only eleven individuals (on the basis of chromosomal abberations), but this is not consistently so. The convergence problem did also not change. The maxiter option still made it stop after a number of iterations (the same number as before). You are also right in assuming that this problem only occurs with GLIMMIX. The conditional model code simply didn't work because SAS did not have enough memory to compute for 12.000 individuals. I attempted to include only boys in the analysis (about 6.000), and that yielded both convergence and computation, but no p-value and strange least marginal means: Three genotypes are present, dividing the participants in three groups, where group 1 and 2 had 17-18 least marginal means, and the third group had 47000. Again, I am very grateful for your quick response, and understand the difficulty of trying to solve a problem like this over the Internet. If you have any other pointers they would be greatly appreciated. Regards, Daniel Johansson

DanielJohansson · ‎07-11-2013

Hello all, I am a VERY green SAS user, having been handed a PROC MIXED script from the powers that be - or rather, that were, considering they have left the scene, and I have precious few, if any, people to turn to for advice on this sort of problem. I have tried to peruse the forum for problems like this, and I understand that it's not completely rare, but also that the problems are usually very specific in nature, depending on the sort of data you have. For that reason, I'm now trying to submit my own question, having banged my head against the proverbial wall for a couple of weeks. Any and all help would be greatly and humbly appreciated. First off, let me restate that I am very very new at this, and that my knowledge of statistics is practical at best - the mathematics and lingo is still well beyond my grasp, and at the moment I am approaching the different methods much like a 15-year-old might approach a carpenter workshop: a hammer is used for a specific task, a screwdriver for another - and I'm still sometimes, in my utter ignorance, probably using a screwdriver when I should have used a hammer. The metaphor is clumsy, but it sadly illustrates my position fairly well. Having said that, here's my conundrum. I'll try to be as specific as possible, and I hope you will forgive if I become unnecessarily long-winded. I'm part of a research group that investigates genetic variations in a population of twins (n=c:a 12000). What we're looking for are autism genes, testing for associations both with continous scores (autism scores range from 0 to 17 at 0,5 intervals, describing a spectrum of symptoms, from what might be considered personality traits, all the way to a full-blown diagnosis with severe morbidity), and with actual disease (using a cutoff that yields a case group and a Control Group). The genetics are represented in single nucleotide polymorphism (essentially very small variations in DNA) that always divide the subjects in three Groups, depending on their genotype, like so: AA, Ab, bb (A and b repesenting genotypes). Usually people use twin populations in a way that takes advantage of the fact that they are twins, but we are doing the opposite: trying to statistically correct for the fact that they are twins (thus essentially trying to ignore the fact that monozygotic twins have the exact same DNA, whereas the dizygotic twins on average share only 25% of their DNA). The data is structured like this (variable explanations below): TwinID FamilyID Tvab Zygosity Genotype AutismScore AutismCutOff 11 1 1 1 AA 4,5 1 12 1 2 1 AA 2,5 0 21 2 1 2 Ab 0 0 22 2 2 2 bb 4 0 31 3 1 2 AA 2 0 32 3 2 2 Ab 5 1 41 4 1 1 bb 0 0 42 4 2 1 bb 0,5 0 TwinID = individual ID FamilyID = a number shared between two twins signifying that they are related Tvab = a unique number within every family (as you can see, with the FamilyID it yields the TwinID) Zygosity = defining monozygotic (1) and dizygotic (2) twins Genotype = explained above AutismScore = the continuous score of autism traits/symptoms AutismCutOff = signifying cases (1) and controls (2) Like I said, the populations contains around 12000 individual twins, making up not exactly 6000 pairs - some of the twins are in the data without their co-twin. The twins are not selected based on having autism or not, but rather screened from all twins born, meaning that those that end up in the "case" group are relatively few (about 300-400). Also, the number of twins that get scores at all are also rather few (can't recall the number right now), so the mean score for each genotype is very low, since at least 8000 individuals will score 0. In order to ignore the fact that these are twins, we have used the following script for the continuous variable: proc mixed data=autism; class Genotype Tvab Zygosity FamilyID; model AutismScore=Genotype /ddfm=SATTERTHWAITE; repeated Tvab / group=Zygosity subject=FamilyID type=un; lsmeans Genotype / diff; As for the case/control analysis, we have used the following (to the same end): proc glimmix data=autism; class Genotype Tvab Zygosity FamilyID; model AutismScore=Genotype /dist=binary link=logit OR; random Tvab /subject=FamilyID group=Zygosity type=un residual; lsmeans Genotype / diff; The problem, now, is the following. When performing a fair amount of these analyses, the process will end with one of two different warnings: 1) Did not converge. This is a problem we sometimes encountered with a smaller but otherwise identical sample size, and we got around that by (without knowing if that entirely correct or not) changing the covariance matrix to Compound Symmetry (by the way, don't let the fact that I know what it's called fool you - it has nothing to do with actually understanding what it does). The convergence problems are not very big at this point, but some comment on what might cause that would be appreciated. One of are hypotheses concerning that was that it might have to do with very small groups, for instance when one homozygote, say bb, is very rare and only present in about 100 subjects (yielding other groups with AA 6000 and Ab around 5000 or so). We encountered this mainly when we had three values for zygosity: 1 for MZ, 2 for DZ and 3 for unknown, where the subjects with 3 would be very few - removing those with unknown zygosity usually solved that problem. However, we would still appreciate some hint as to whether we are right, or if it's something else causing this. 2) Stopped because of infinite likelihood. This is our main problem right now, and I seem unable to do much about it. Playing around with the covariance matrix sometimes fixes it, and sometimes excluding individuals with chromosomal abberations (only 11 individuals) also works, but my problem is that this solution is not consistent when performing other analyses (on other genotypes, or other scores (eg. ADHD)), where the structure of the data is the same. Also, I have no idea what is causing the problem, or if changing the covariance matrix is even something you're allowed to do. The only theory I have is that it is again a problem of small groups of individuals with one genotype (eg. bb being present in only 100-200 individuals), but that is somewhat contradicted by the fact that it sometimes works if I exclude no more than 11 individuals, which doesn't change the distributions that much. I realize that this is a long email, and if you have read this far, I am indebted to you. I also realize that giving a down-to-earth layman's answer might be easier said than done, especially when you haven't seen the data. But any insight, clue, hint or even a good luck, would be enormously appreciated. Kind regards, Daniel Johansson

Online Status	Offline
Date Last Visited	‎07-06-2016 05:07 AM

Re: Random effects and non-normal distributions

Re: Random effects and non-normal distributions

Re: Random effects and non-normal distributions

Random effects and non-normal distributions

Re: PROC MIXED used to correct for zygosity ---> "Stopped because of i...

Re: PROC MIXED used to correct for zygosity ---> "Stopped because of i...

Re: PROC MIXED used to correct for zygosity ---> "Stopped because of i...

PROC MIXED used to correct for zygosity ---> "Stopped because of infin...

Re: Random effects and non-normal distributions

Re: Random effects and non-normal distributions

Re: Random effects and non-normal distributions

Re: Random effects and non-normal distributions

Random effects and non-normal distributions

Re: PROC MIXED used to correct for zygosity ---> "Stopped because of i...

Re: PROC MIXED used to correct for zygosity ---> "Stopped because of i...

Re: PROC MIXED used to correct for zygosity ---> "Stopped because of i...

PROC MIXED used to correct for zygosity ---> "Stopped because of infin...