Statistical Procedures

palolix · Posted 12-05-2014 03:35 AM

Dear SAS Community,

I have analyzed my data as repeated measures, as the dependent variables were measured every year in a multi-year trial. I have 2 independent variables; nem and year. When plotted my data I observed that the population dynamics of two dependent variables (number of total eggs and number of diseased eggs) show opposite oscillations during the 5 years evaluated, so I would like to test for a negative correlation between these 2 variables. Is there any way to modify this repeated measures code in order to test this or should I test the correlation for each year separately?

Proc glimmix data=one;

class nem blk year;

model harEggsT= nem|year/dist=lognormal ddfm=kr;

random intercept/subject=blk;

random year/residual subject=blk*nem type=ar(1);

run;

I would greatly appreciate any help!

Thank you,

Caroline

SteveDenham · Posted 12-05-2014 10:36 AM

Hi Caroline,

I don't see how you are getting the two kinds of eggs into this. I also wonder about using dist=lognormal for a count variable.

If you can code eggtype as 'Total' and 'Diseased', then what about:

Proc glimmix data=one;

class nem blk year eggtype;

model harEggsT= eggtype|nem|year/dist=negbin ddfm=kr;

random intercept/subject=blk;

random year/residual subject=blk*nem type=ar(1) group=eggtype vcorr;

run;

Steve Denham

palolix · Posted 12-05-2014 11:31 AM

Hi Steve!

I am so glad you came to help me . Thank you!!

I already check the distribution and the lognormal was the most suitable (is usually the case for nematode counts). harEggsT (total eggs) is one of the dependent variable that I want to test for negative corr with harEggsD (diseased eggs), so I did not know how to include both dependent variables in the same model, but I will try what you are suggesting.

Thank you very much Steve,

Caroline

palolix · Posted 12-05-2014 12:15 PM

Hi Steve,

When I used a lognormal distribution I get a Chi-Square/DF=1, but when I use negbin do not converge, so I will stay with the lognormal.

It seems that your code worked. There is a significant effect for eggtype. Anyhow I do not know how to interpret the results, because I get a table called "Estimated V Correlation Matrix for blk A" with rows and columns. Why only for blk A? I also get the Type III Tests of Fixed Effects table, a Cov Parameter Estimates table, but nothing else :smileyplain:

Thank you dear Steve!

Caroline

SteveDenham · Posted 12-05-2014 12:37 PM

Hi Caroline,

Last to first: The output is all that the code I provided should be producing. The V correlation matrix only presents for block A, as it is identical for all blocks. The hard part is mapping the correlations to the fixed effects. If the two endpoints are truly negatively correlated, there should be some negative values located block diagonally that reflect the within year correlations for the two egg types.

And I bet your counts are fairly large numbers, so lognormal makes sense, as opposed to negbin.

Steve Denham

palolix · Posted 12-05-2014 12:57 PM

Thanks a lot Steve!

Thank you for the explanation. You are right, the counts are large numbers. Unfortunately I do not see any single negative value in that table

I would now try with healthy eggs and disease eggs. Do you think is it correct to look for negative corr between number of total eggs and proportion of disease eggs (instead of number of disease eggs)?

I appreciate your great help Steve!!

Caroline

SteveDenham · Posted 12-05-2014 01:20 PM

That latter correlation is difficult, because one measure is a count and the other a proportion. Hard to fit something that disparate into the same model, although there is an example in the GLIMMIX documentation. To make it work, try log transforming your counts (or square root) and fitting gaussian and binomial. I just have a feeling that will work better than lognormal and binomial.

I would try healthy and disease as levels within eggtype first. Still, I have a hunch these two counts are positively related--good weather means more eggs of both types.

Steve Denham

palolix · Posted 12-05-2014 02:20 PM

I tried healthy eggs with disease eggs. Significant interactions including eggtype, but still no negative values. I also compared proportion of healthy eggs with proportion of disease eggs, but no negative values.

I would appreciate if you could have a look, maybe I am doing something wrong. (harEggsT means he number of total eggs. I also divided 4 because I need the numbers in 100g soil and he raw data is in 400g)

data one;

input nem$ blk$ year eggtype$ harEggs harEggsT;

harEggs=harEggs/4;

harEggs=harEggs+1;

harEggsT=harEggsT/4;

harEggsT=harEggsT+1;

harPerEggs=100*harEggs/harEggsT;

cards;

Chav A 2010 healthy 1400 8700

Chav A 2010 diseased 7300 8700

Chav A 2011 healthy 27375 35250

Chav A 2011 diseased 7875 35250

Chav A 2012 healthy 9700 27900

Chav A 2012 diseased 18200 27900

Chav A 2013 healthy 60625 68875

Chav A 2013 diseased 8250 68875

Chav A 2014 healthy 19350 34425

Chav A 2014 diseased 15075 34425

Chav B 2010 healthy 3000 10800

Chav B 2010 diseased 7800 10800

Chav B 2011 healthy 17100 21700

Chav B 2011 diseased 4600 21700

Chav B 2012 healthy 11625 28275

Chav B 2012 diseased 16650 28275

Chav B 2013 healthy 71750 81500

Chav B 2013 diseased 9750 81500

Chav B 2014 healthy 25500 46800

Chav B 2014 diseased 21300 46800

Chav C 2010 healthy 27200 36000

Chav C 2010 diseased 8800 36000

Chav C 2011 healthy 44250 66500

Chav C 2011 diseased 22250 66500

Chav C 2012 healthy 13500 26500

Chav C 2012 diseased 13000 26500

Chav C 2013 healthy 82800 97650

Chav C 2013 diseased 14850 97650

Chav C 2014 healthy 29400 38300

Chav C 2014 diseased 8900 38300

Chav D 2010 healthy 5600 22400

Chav D 2010 diseased 16800 22400

Chav D 2011 healthy 43500 53100

Chav D 2011 diseased 9600 53100

Chav D 2012 healthy 17800 43900

Chav D 2012 diseased 26100 43900

Chav D 2013 healthy 62200 70600

Chav D 2013 diseased 8400 70600

Chav D 2014 healthy 35125 47875

Chav D 2014 diseased 12750 47875

Delm A 2010 healthy 800 12600

Delm A 2010 diseased 11800 12600

Delm A 2011 healthy 2850 24750

Delm A 2011 diseased 21900 24750

Delm A 2012 healthy 9150 34200

Delm A 2012 diseased 25050 34200

Delm A 2013 healthy 21500 30000

Delm A 2013 diseased 8500 30000

Delm A 2014 healthy 8750 16850

Delm A 2014 diseased 8100 16850

Delm B 2010 healthy 7000 46800

Delm B 2010 diseased 39800 46800

Delm B 2011 healthy 28000 62000

Delm B 2011 diseased 34000 62000

Delm B 2012 healthy 12400 41200

Delm B 2012 diseased 28800 41200

Delm B 2013 healthy 28350 37275

Delm B 2013 diseased 8925 37275

Delm B 2014 healthy 50300 72100

Delm B 2014 diseased 21800 72100

Delm C 2010 healthy 17250 55050

Delm C 2010 diseased 37800 55050

Delm C 2011 healthy 44700 88650

Delm C 2011 diseased 43950 88650

Delm C 2012 healthy 9400 51700

Delm C 2012 diseased 42300 51700

Delm C 2013 healthy 54375 89750

Delm C 2013 diseased 35375 89750

Delm C 2014 healthy 28200 56700

Delm C 2014 diseased 28500 56700

Delm D 2010 healthy . .

Delm D 2010 diseased . .

Delm D 2011 healthy . .

Delm D 2011 diseased . .

Delm D 2012 healthy . .

Delm D 2012 diseased . .

Delm D 2013 healthy . .

Delm D 2013 diseased . .

Delm D 2014 healthy . .

Delm D 2014 diseased . .

Proc glimmix data=one;

harPerEggsp=harPerEggs/100;

class nem blk year eggtype;

model harPerEggsp= eggtype|nem|year/dist=beta ddfm=kr;

random intercept/subject=blk;

random year/residual subject=blk*nem type=ar(1) group=eggtype vcorr;

run;

Thank you very much Steve!!

palolix · Posted 12-05-2014 03:46 PM

Now I compared the proportion of diseased eggs with the number of total eggs (log10 transformed), using dist=gaussian (you did not mean using gaussian and binomial at the same time right?) and it worked!!!!

I finally get negative values and the values in the table goes in the right direction (I guess)

1	1.0000		-0.03965		0.001573		-0.00006		2.473E-6
2		1.0000		0.6982		0.4875		0.3404		0.2377
3	-0.03965		1.0000		-0.03965		0.001573		-0.00006
4		0.6982		1.0000		0.6982		0.4875		0.3404
5	0.001573		-0.03965		1.0000		-0.03965		0.001573
6		0.4875		0.6982		1.0000		0.6982		0.4875
7	-0.00006		0.001573		-0.03965		1.0000		-0.03965
8		0.3404		0.4875		0.6982		1.0000		0.6982
9	2.473E-6		-0.00006		0.001573		-0.03965		1.0000
10		0.2377		0.3404		0.4875		0.6982		1.0000
11											1.0000		-0.03965		0.001573		-0.00006		2.473E-6
12												1.0000		0.6982		0.4875		0.3404		0.2377
13											-0.03965		1.0000		-0.03965		0.001573		-0.00006
14												0.6982		1.0000		0.6982		0.4875		0.3404
15											0.001573		-0.03965		1.0000		-0.03965		0.001573
16												0.4875		0.6982		1.0000		0.6982		0.4875
17											-0.00006		0.001573		-0.03965		1.0000		-0.03965
18												0.3404		0.4875		0.6982		1.0000		0.6982
19											2.473E-6		-0.00006		0.001573		-0.03965		1.0000
20												0.2377		0.3404		0.4875		0.6982		1.0000

SteveDenham · Posted 12-08-2014 10:02 AM

That looks like what you need. It may be overly complicating matters to specify two different distributions, and fitting two things simultaneously (and most of the times I have tried, I run into memory problems). However, it can be done. Or it could be done by applying a logit transform to the proportion before analyzing, so that you have the proportion on an open-ended interval, which makes more sense for the correlation estimates.

Steve Denham

palolix · Posted 12-08-2014 12:47 PM

Thanks again Steve!

I tried transforming the number of total eggs and he proportion of diseased eggs with log10, and dist=gaussian but did not work

ERROR: QUANEW Optimization cannot be completed

Optimization routine cannot improve the function value

SteveDenham · Posted 12-08-2014 12:57 PM

Total number of eggs --> log transform (any base will do)

Proportion of diseased eggs --> logit transform = log ( proportion / (1 - proportion). This is the log of the odds ratio. Again any base will do, but usually natural logs are used. If you use that, the QUANEW problem might be solved, and if not try: NLOPTIONS tech=nrridg;

Steve Denham

palolix · Posted 12-08-2014 01:36 PM

Oh noooo!!! now is the problem solved, but I only get positive corr values (I used log without any base)

1	1.0000		0.6982		0.4875		0.3404		0.2377
2		1.0000		0.1012		0.01025		0.001037		0.000105
3	0.6982		1.0000		0.6982		0.4875		0.3404
4		0.1012		1.0000		0.1012		0.01025		0.001037
5	0.4875		0.6982		1.0000		0.6982		0.4875
6		0.01025		0.1012		1.0000		0.1012		0.01025
7	0.3404		0.4875		0.6982		1.0000		0.6982
8		0.001037		0.01025		0.1012		1.0000		0.1012
9	0.2377		0.3404		0.4875		0.6982		1.0000
10		0.000105		0.001037		0.01025		0.1012		1.0000
11											1.0000		0.6982		0.4875		0.3404		0.2377
12												1.0000		0.1012		0.01025		0.001037		0.000105
13											0.6982		1.0000		0.6982		0.4875		0.3404
14												0.1012		1.0000		0.1012		0.01025		0.001037
15											0.4875		0.6982		1.0000		0.6982		0.4875
16												0.01025		0.1012		1.0000		0.1012		0.01025
17											0.3404		0.4875		0.6982		1.0000		0.6982
18												0.001037		0.01025		0.1012		1.0000		0.1012
19											0.2377		0.3404		0.4875		0.6982		1.0000
20												0.000105		0.001037		0.01025		0.1012		1.0000

the logit transformation for the proportions yielded some negative values.

Thank you Steve!

SteveDenham · Posted 12-08-2014 01:47 PM

The logit should give some negative values, because if the proportion is less than 0.5, you have the log of a value less than one, which is negative.

Now, these correlations look reasonable to me--the higher the total egg count, the higher the proportion diseased. Visual inspection of the data seems to support that. You could get negative values, I think, by looking at the proportion healthy.

Steve Denham

palolix · Posted 12-08-2014 02:22 PM

I did not get negative values for prop of healthy vs diseased either Do you think I could use the results (neg corr) from the analysis including number of total eggs (log transformed) and prop of diseased (without log trans) in the same model with gaussian dist, or was this analysis incorrect ? (because you should not compare counts with proportions)

Thank you Steve

Statistical Procedures

testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...