Dear SAS Community,
I have analyzed my data as repeated measures, as the dependent variables were measured every year in a multi-year trial. I have 2 independent variables; nem and year. When plotted my data I observed that the population dynamics of two dependent variables (number of total eggs and number of diseased eggs) show opposite oscillations during the 5 years evaluated, so I would like to test for a negative correlation between these 2 variables. Is there any way to modify this repeated measures code in order to test this or should I test the correlation for each year separately?
Proc glimmix data=one;
class nem blk year;
model harEggsT= nem|year/dist=lognormal ddfm=kr;
random intercept/subject=blk;
random year/residual subject=blk*nem type=ar(1);
run;
I would greatly appreciate any help!
Thank you,
Caroline
Hi Caroline,
I don't see how you are getting the two kinds of eggs into this. I also wonder about using dist=lognormal for a count variable.
If you can code eggtype as 'Total' and 'Diseased', then what about:
Proc glimmix data=one;
class nem blk year eggtype;
model harEggsT= eggtype|nem|year/dist=negbin ddfm=kr;
random intercept/subject=blk;
random year/residual subject=blk*nem type=ar(1) group=eggtype vcorr;
run;
Steve Denham
 
Hi Steve!
I am so glad you came to help me  . Thank you!!
. Thank you!!
I already check the distribution and the lognormal was the most suitable (is usually the case for nematode counts). harEggsT (total eggs) is one of the dependent variable that I want to test for negative corr with harEggsD (diseased eggs), so I did not know how to include both dependent variables in the same model, but I will try what you are suggesting.
Thank you very much Steve,
Caroline
Hi Steve,
When I used a lognormal distribution I get a Chi-Square/DF=1, but when I use negbin do not converge, so I will stay with the lognormal.
It seems that your code worked. There is a significant effect for eggtype. Anyhow I do not know how to interpret the results, because I get a table called "Estimated V Correlation Matrix for blk A" with rows and columns. Why only for blk A? I also get the Type III Tests of Fixed Effects table, a Cov Parameter Estimates table, but nothing else :smileyplain:
Thank you dear Steve!
Caroline
Hi Caroline,
Last to first: The output is all that the code I provided should be producing. The V correlation matrix only presents for block A, as it is identical for all blocks. The hard part is mapping the correlations to the fixed effects. If the two endpoints are truly negatively correlated, there should be some negative values located block diagonally that reflect the within year correlations for the two egg types.
And I bet your counts are fairly large numbers, so lognormal makes sense, as opposed to negbin.
Steve Denham
Thanks a lot Steve!
Thank you for the explanation. You are right, the counts are large numbers. Unfortunately I do not see any single negative value in that table  
 
I would now try with healthy eggs and disease eggs. Do you think is it correct to look for negative corr between number of total eggs and proportion of disease eggs (instead of number of disease eggs)?
I appreciate your great help Steve!!
Caroline
That latter correlation is difficult, because one measure is a count and the other a proportion. Hard to fit something that disparate into the same model, although there is an example in the GLIMMIX documentation. To make it work, try log transforming your counts (or square root) and fitting gaussian and binomial. I just have a feeling that will work better than lognormal and binomial.
I would try healthy and disease as levels within eggtype first. Still, I have a hunch these two counts are positively related--good weather means more eggs of both types.
Steve Denham
I tried healthy eggs with disease eggs. Significant interactions including eggtype, but still no negative values. I also compared proportion of healthy eggs with proportion of disease eggs, but no negative values.
I would appreciate if you could have a look, maybe I am doing something wrong. (harEggsT means he number of total eggs. I also divided 4 because I need the numbers in 100g soil and he raw data is in 400g)
data one;
input nem$ blk$ year eggtype$ harEggs harEggsT;
harEggs=harEggs/4;
harEggs=harEggs+1;
harEggsT=harEggsT/4;
harEggsT=harEggsT+1;
harPerEggs=100*harEggs/harEggsT;
cards;
Chav A 2010 healthy 1400 8700
Chav A 2010 diseased 7300 8700
Chav A 2011 healthy 27375 35250
Chav A 2011 diseased 7875 35250
Chav A 2012 healthy 9700 27900
Chav A 2012 diseased 18200 27900
Chav A 2013 healthy 60625 68875
Chav A 2013 diseased 8250 68875
Chav A 2014 healthy 19350 34425
Chav A 2014 diseased 15075 34425
Chav B 2010 healthy 3000 10800
Chav B 2010 diseased 7800 10800
Chav B 2011 healthy 17100 21700
Chav B 2011 diseased 4600 21700
Chav B 2012 healthy 11625 28275
Chav B 2012 diseased 16650 28275
Chav B 2013 healthy 71750 81500
Chav B 2013 diseased 9750 81500
Chav B 2014 healthy 25500 46800
Chav B 2014 diseased 21300 46800
Chav C 2010 healthy 27200 36000
Chav C 2010 diseased 8800 36000
Chav C 2011 healthy 44250 66500
Chav C 2011 diseased 22250 66500
Chav C 2012 healthy 13500 26500
Chav C 2012 diseased 13000 26500
Chav C 2013 healthy 82800 97650
Chav C 2013 diseased 14850 97650
Chav C 2014 healthy 29400 38300
Chav C 2014 diseased 8900 38300
Chav D 2010 healthy 5600 22400
Chav D 2010 diseased 16800 22400
Chav D 2011 healthy 43500 53100
Chav D 2011 diseased 9600 53100
Chav D 2012 healthy 17800 43900
Chav D 2012 diseased 26100 43900
Chav D 2013 healthy 62200 70600
Chav D 2013 diseased 8400 70600
Chav D 2014 healthy 35125 47875
Chav D 2014 diseased 12750 47875
Delm A 2010 healthy 800 12600
Delm A 2010 diseased 11800 12600
Delm A 2011 healthy 2850 24750
Delm A 2011 diseased 21900 24750
Delm A 2012 healthy 9150 34200
Delm A 2012 diseased 25050 34200
Delm A 2013 healthy 21500 30000
Delm A 2013 diseased 8500 30000
Delm A 2014 healthy 8750 16850
Delm A 2014 diseased 8100 16850
Delm B 2010 healthy 7000 46800
Delm B 2010 diseased 39800 46800
Delm B 2011 healthy 28000 62000
Delm B 2011 diseased 34000 62000
Delm B 2012 healthy 12400 41200
Delm B 2012 diseased 28800 41200
Delm B 2013 healthy 28350 37275
Delm B 2013 diseased 8925 37275
Delm B 2014 healthy 50300 72100
Delm B 2014 diseased 21800 72100
Delm C 2010 healthy 17250 55050
Delm C 2010 diseased 37800 55050
Delm C 2011 healthy 44700 88650
Delm C 2011 diseased 43950 88650
Delm C 2012 healthy 9400 51700
Delm C 2012 diseased 42300 51700
Delm C 2013 healthy 54375 89750
Delm C 2013 diseased 35375 89750
Delm C 2014 healthy 28200 56700
Delm C 2014 diseased 28500 56700
Delm D 2010 healthy . .
Delm D 2010 diseased . .
Delm D 2011 healthy . .
Delm D 2011 diseased . .
Delm D 2012 healthy . .
Delm D 2012 diseased . .
Delm D 2013 healthy . .
Delm D 2013 diseased . .
Delm D 2014 healthy . .
Delm D 2014 diseased . .
Proc glimmix data=one;
harPerEggsp=harPerEggs/100;
class nem blk year eggtype;
model harPerEggsp= eggtype|nem|year/dist=beta ddfm=kr;
random intercept/subject=blk;
random year/residual subject=blk*nem type=ar(1) group=eggtype vcorr;
run;
Thank you very much Steve!!
Now I compared the proportion of diseased eggs with the number of total eggs (log10 transformed), using dist=gaussian (you did not mean using gaussian and binomial at the same time right?) and it worked!!!!
I finally get negative values  and the values in the table goes in the right direction (I guess)
 and the values in the table goes in the right direction (I guess)
| 1 | 1.0000 | -0.03965 | 0.001573 | -0.00006 | 2.473E-6 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1.0000 | 0.6982 | 0.4875 | 0.3404 | 0.2377 | |||||||||||||||
| 3 | -0.03965 | 1.0000 | -0.03965 | 0.001573 | -0.00006 | |||||||||||||||
| 4 | 0.6982 | 1.0000 | 0.6982 | 0.4875 | 0.3404 | |||||||||||||||
| 5 | 0.001573 | -0.03965 | 1.0000 | -0.03965 | 0.001573 | |||||||||||||||
| 6 | 0.4875 | 0.6982 | 1.0000 | 0.6982 | 0.4875 | |||||||||||||||
| 7 | -0.00006 | 0.001573 | -0.03965 | 1.0000 | -0.03965 | |||||||||||||||
| 8 | 0.3404 | 0.4875 | 0.6982 | 1.0000 | 0.6982 | |||||||||||||||
| 9 | 2.473E-6 | -0.00006 | 0.001573 | -0.03965 | 1.0000 | |||||||||||||||
| 10 | 0.2377 | 0.3404 | 0.4875 | 0.6982 | 1.0000 | |||||||||||||||
| 11 | 1.0000 | -0.03965 | 0.001573 | -0.00006 | 2.473E-6 | |||||||||||||||
| 12 | 1.0000 | 0.6982 | 0.4875 | 0.3404 | 0.2377 | |||||||||||||||
| 13 | -0.03965 | 1.0000 | -0.03965 | 0.001573 | -0.00006 | |||||||||||||||
| 14 | 0.6982 | 1.0000 | 0.6982 | 0.4875 | 0.3404 | |||||||||||||||
| 15 | 0.001573 | -0.03965 | 1.0000 | -0.03965 | 0.001573 | |||||||||||||||
| 16 | 0.4875 | 0.6982 | 1.0000 | 0.6982 | 0.4875 | |||||||||||||||
| 17 | -0.00006 | 0.001573 | -0.03965 | 1.0000 | -0.03965 | |||||||||||||||
| 18 | 0.3404 | 0.4875 | 0.6982 | 1.0000 | 0.6982 | |||||||||||||||
| 19 | 2.473E-6 | -0.00006 | 0.001573 | -0.03965 | 1.0000 | |||||||||||||||
| 20 | 0.2377 | 0.3404 | 0.4875 | 0.6982 | 1.0000 | 
That looks like what you need. It may be overly complicating matters to specify two different distributions, and fitting two things simultaneously (and most of the times I have tried, I run into memory problems). However, it can be done. Or it could be done by applying a logit transform to the proportion before analyzing, so that you have the proportion on an open-ended interval, which makes more sense for the correlation estimates.
Steve Denham
Thanks again Steve!
I tried transforming the number of total eggs and he proportion of diseased eggs with log10, and dist=gaussian but did not work 
ERROR: QUANEW Optimization cannot be completed
Optimization routine cannot improve the function value
Total number of eggs --> log transform (any base will do)
Proportion of diseased eggs --> logit transform = log ( proportion / (1 - proportion). This is the log of the odds ratio. Again any base will do, but usually natural logs are used. If you use that, the QUANEW problem might be solved, and if not try: NLOPTIONS tech=nrridg;
Steve Denham
Oh noooo!!! now is the problem solved, but I only get positive corr values (I used log without any base)
| 1 | 1.0000 | 0.6982 | 0.4875 | 0.3404 | 0.2377 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1.0000 | 0.1012 | 0.01025 | 0.001037 | 0.000105 | |||||||||||||||
| 3 | 0.6982 | 1.0000 | 0.6982 | 0.4875 | 0.3404 | |||||||||||||||
| 4 | 0.1012 | 1.0000 | 0.1012 | 0.01025 | 0.001037 | |||||||||||||||
| 5 | 0.4875 | 0.6982 | 1.0000 | 0.6982 | 0.4875 | |||||||||||||||
| 6 | 0.01025 | 0.1012 | 1.0000 | 0.1012 | 0.01025 | |||||||||||||||
| 7 | 0.3404 | 0.4875 | 0.6982 | 1.0000 | 0.6982 | |||||||||||||||
| 8 | 0.001037 | 0.01025 | 0.1012 | 1.0000 | 0.1012 | |||||||||||||||
| 9 | 0.2377 | 0.3404 | 0.4875 | 0.6982 | 1.0000 | |||||||||||||||
| 10 | 0.000105 | 0.001037 | 0.01025 | 0.1012 | 1.0000 | |||||||||||||||
| 11 | 1.0000 | 0.6982 | 0.4875 | 0.3404 | 0.2377 | |||||||||||||||
| 12 | 1.0000 | 0.1012 | 0.01025 | 0.001037 | 0.000105 | |||||||||||||||
| 13 | 0.6982 | 1.0000 | 0.6982 | 0.4875 | 0.3404 | |||||||||||||||
| 14 | 0.1012 | 1.0000 | 0.1012 | 0.01025 | 0.001037 | |||||||||||||||
| 15 | 0.4875 | 0.6982 | 1.0000 | 0.6982 | 0.4875 | |||||||||||||||
| 16 | 0.01025 | 0.1012 | 1.0000 | 0.1012 | 0.01025 | |||||||||||||||
| 17 | 0.3404 | 0.4875 | 0.6982 | 1.0000 | 0.6982 | |||||||||||||||
| 18 | 0.001037 | 0.01025 | 0.1012 | 1.0000 | 0.1012 | |||||||||||||||
| 19 | 0.2377 | 0.3404 | 0.4875 | 0.6982 | 1.0000 | |||||||||||||||
| 20 | 0.000105 | 0.001037 | 0.01025 | 0.1012 | 1.0000 | 
the logit transformation for the proportions yielded some negative values.
Thank you Steve!
The logit should give some negative values, because if the proportion is less than 0.5, you have the log of a value less than one, which is negative.
Now, these correlations look reasonable to me--the higher the total egg count, the higher the proportion diseased. Visual inspection of the data seems to support that. You could get negative values, I think, by looking at the proportion healthy.
Steve Denham
I did not get negative values for prop of healthy vs diseased either  Do you think I could use the results (neg corr) from the analysis including number of total eggs (log transformed) and prop of diseased (without log trans) in the same model with gaussian dist, or was this analysis incorrect ? (because you should not compare counts with proportions)
  Do you think I could use the results (neg corr) from the analysis including number of total eggs (log transformed) and prop of diseased (without log trans) in the same model with gaussian dist, or was this analysis incorrect ? (because you should not compare counts with proportions)
Thank you Steve
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.
