Contributor
Posts: 73

# testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Dear SAS Community,

I have analyzed my data as repeated measures, as the dependent variables were measured every year in a multi-year trial. I have 2 independent variables; nem and year. When plotted my data I observed that the population dynamics of two dependent variables (number of total eggs and number of diseased eggs) show opposite oscillations during the 5 years evaluated, so I would like to test for a negative correlation between these 2 variables. Is there any way to modify this repeated measures code in order to test this or should I test the correlation for each year separately?

Proc glimmix data=one;

class nem blk year;

model harEggsT= nem|year/dist=lognormal ddfm=kr;

random intercept/subject=blk;

random year/residual subject=blk*nem type=ar(1);

run;

I would greatly appreciate any help!

Thank you,

Caroline

Posts: 2,655

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Hi Caroline,

I don't see how you are getting the two kinds of eggs into this.  I also wonder about using dist=lognormal for a count variable.

If you can code eggtype as 'Total' and 'Diseased', then what about:

Proc glimmix data=one;

class nem blk year eggtype;

model harEggsT= eggtype|nem|year/dist=negbin ddfm=kr;

random intercept/subject=blk;

random year/residual subject=blk*nem type=ar(1) group=eggtype vcorr;

run;

Steve Denham

Contributor
Posts: 73

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Hi Steve!

I am so glad you came to help me . Thank you!!

I already check the distribution and the lognormal was the most suitable (is usually the case for nematode counts). harEggsT (total eggs) is one of the dependent variable that I want to test for negative corr with harEggsD (diseased eggs), so I did not know how to include both dependent variables in the same model, but I will try what you are suggesting.

Thank you very much Steve,

Caroline

Contributor
Posts: 73

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Hi Steve,

When I used a lognormal distribution I get a Chi-Square/DF=1, but when I use negbin do not converge, so I will stay with the lognormal.

It seems that your code worked. There is a significant effect for eggtype. Anyhow I do not know how to interpret the results, because I get a table called "Estimated V Correlation Matrix for blk A" with rows and columns.  Why only for blk A? I also get the Type III Tests of Fixed Effects table, a Cov Parameter Estimates table, but nothing else :smileyplain:

Thank you dear Steve!

Caroline

Posts: 2,655

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Hi Caroline,

Last to first:  The output is all that the code I provided should be producing.  The V correlation matrix only presents for block A, as it is identical for all blocks.  The hard part is mapping the correlations to the fixed effects. If the two endpoints are truly negatively correlated, there should be some negative values located block diagonally that reflect the within year correlations for the two egg types.

And I bet your counts are fairly large numbers, so lognormal makes sense, as opposed to negbin.

Steve Denham

Contributor
Posts: 73

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Thanks a lot Steve!

Thank you for the explanation. You are right, the counts are large numbers. Unfortunately I do not see any single negative value in that table

I would now try with healthy eggs and disease eggs. Do you think is it correct to look for negative corr between number of total eggs and proportion of disease eggs (instead of number of disease eggs)?

I appreciate your great help Steve!!

Caroline

Posts: 2,655

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

That latter correlation is difficult, because one measure is a count and the other a proportion.  Hard to fit something that disparate into the same model, although there is an example in the GLIMMIX documentation.  To make it work, try log transforming your counts (or square root) and fitting gaussian and binomial.  I just have a feeling that will work better than lognormal and binomial.

I would try healthy and disease as levels within eggtype first.  Still, I have a hunch these two counts are positively related--good weather means more eggs of both types.

Steve Denham

Contributor
Posts: 73

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

I tried healthy eggs with disease eggs. Significant interactions including eggtype, but still no negative values. I also compared proportion of healthy eggs with proportion of disease eggs, but no negative values.

I would appreciate if you could have a look, maybe I am doing something wrong. (harEggsT means he number of total eggs. I also divided 4 because I need the numbers in 100g soil and he raw data is in 400g)

data one;

input nem\$ blk\$    year eggtype\$ harEggs harEggsT;

harEggs=harEggs/4;

harEggs=harEggs+1;

harEggsT=harEggsT/4;

harEggsT=harEggsT+1;

harPerEggs=100*harEggs/harEggsT;

cards;

Chav    A    2010    healthy    1400    8700

Chav    A    2010    diseased    7300    8700

Chav    A    2011    healthy    27375    35250

Chav    A    2011    diseased    7875    35250

Chav    A    2012    healthy    9700    27900

Chav    A    2012    diseased    18200    27900

Chav    A    2013    healthy    60625    68875

Chav    A    2013    diseased    8250    68875

Chav    A    2014    healthy    19350    34425

Chav    A    2014    diseased    15075    34425

Chav    B    2010    healthy    3000    10800

Chav    B    2010    diseased    7800    10800

Chav    B    2011    healthy    17100    21700

Chav    B    2011    diseased    4600    21700

Chav    B    2012    healthy    11625    28275

Chav    B    2012    diseased    16650    28275

Chav    B    2013    healthy    71750    81500

Chav    B    2013    diseased    9750    81500

Chav    B    2014    healthy    25500    46800

Chav    B    2014    diseased    21300    46800

Chav    C    2010    healthy    27200    36000

Chav    C    2010    diseased    8800    36000

Chav    C    2011    healthy    44250    66500

Chav    C    2011    diseased    22250    66500

Chav    C    2012    healthy    13500    26500

Chav    C    2012    diseased    13000    26500

Chav    C    2013    healthy    82800    97650

Chav    C    2013    diseased    14850    97650

Chav    C    2014    healthy    29400    38300

Chav    C    2014    diseased    8900    38300

Chav    D    2010    healthy    5600    22400

Chav    D    2010    diseased    16800    22400

Chav    D    2011    healthy    43500    53100

Chav    D    2011    diseased    9600    53100

Chav    D    2012    healthy    17800    43900

Chav    D    2012    diseased    26100    43900

Chav    D    2013    healthy    62200    70600

Chav    D    2013    diseased    8400    70600

Chav    D    2014    healthy    35125    47875

Chav    D    2014    diseased    12750    47875

Delm    A    2010    healthy    800    12600

Delm    A    2010    diseased    11800    12600

Delm    A    2011    healthy    2850    24750

Delm    A    2011    diseased    21900    24750

Delm    A    2012    healthy    9150    34200

Delm    A    2012    diseased    25050    34200

Delm    A    2013    healthy    21500    30000

Delm    A    2013    diseased    8500    30000

Delm    A    2014    healthy    8750    16850

Delm    A    2014    diseased    8100    16850

Delm    B    2010    healthy    7000    46800

Delm    B    2010    diseased    39800    46800

Delm    B    2011    healthy    28000    62000

Delm    B    2011    diseased    34000    62000

Delm    B    2012    healthy    12400    41200

Delm    B    2012    diseased    28800    41200

Delm    B    2013    healthy    28350    37275

Delm    B    2013    diseased    8925    37275

Delm    B    2014    healthy    50300    72100

Delm    B    2014    diseased    21800    72100

Delm    C    2010    healthy    17250    55050

Delm    C    2010    diseased    37800    55050

Delm    C    2011    healthy    44700    88650

Delm    C    2011    diseased    43950    88650

Delm    C    2012    healthy    9400    51700

Delm    C    2012    diseased    42300    51700

Delm    C    2013    healthy    54375    89750

Delm    C    2013    diseased    35375    89750

Delm    C    2014    healthy    28200    56700

Delm    C    2014    diseased    28500    56700

Delm    D    2010    healthy    .    .

Delm    D    2010    diseased    .    .

Delm    D    2011    healthy    .    .

Delm    D    2011    diseased    .    .

Delm    D    2012    healthy    .    .

Delm    D    2012    diseased    .    .

Delm    D    2013    healthy    .    .

Delm    D    2013    diseased    .    .

Delm    D    2014    healthy    .    .

Delm    D    2014    diseased    .    .

Proc glimmix data=one;

harPerEggsp=harPerEggs/100;

class nem blk year eggtype;

model harPerEggsp= eggtype|nem|year/dist=beta ddfm=kr;

random intercept/subject=blk;

random year/residual subject=blk*nem type=ar(1) group=eggtype vcorr;

run;

Thank you very much Steve!!

Contributor
Posts: 73

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Now I compared the proportion of diseased eggs with the number of total eggs (log10 transformed), using dist=gaussian (you did not mean using gaussian and binomial at the same time right?) and it worked!!!!

I finally get negative values and the values in the table goes in the right direction (I guess)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1.0000 -0.03965 0.001573 -0.00006 2.473E-6 1.0000 0.6982 0.4875 0.3404 0.2377 -0.03965 1.0000 -0.03965 0.001573 -0.00006 0.6982 1.0000 0.6982 0.4875 0.3404 0.001573 -0.03965 1.0000 -0.03965 0.001573 0.4875 0.6982 1.0000 0.6982 0.4875 -0.00006 0.001573 -0.03965 1.0000 -0.03965 0.3404 0.4875 0.6982 1.0000 0.6982 2.473E-6 -0.00006 0.001573 -0.03965 1.0000 0.2377 0.3404 0.4875 0.6982 1.0000 1.0000 -0.03965 0.001573 -0.00006 2.473E-6 1.0000 0.6982 0.4875 0.3404 0.2377 -0.03965 1.0000 -0.03965 0.001573 -0.00006 0.6982 1.0000 0.6982 0.4875 0.3404 0.001573 -0.03965 1.0000 -0.03965 0.001573 0.4875 0.6982 1.0000 0.6982 0.4875 -0.00006 0.001573 -0.03965 1.0000 -0.03965 0.3404 0.4875 0.6982 1.0000 0.6982 2.473E-6 -0.00006 0.001573 -0.03965 1.0000 0.2377 0.3404 0.4875 0.6982 1.0000
Posts: 2,655

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

That looks like what you need.  It may be overly complicating matters to specify two different distributions, and fitting two things simultaneously (and most of the times I have tried, I run into memory problems).  However, it can be done.  Or it could be done by applying a logit transform to the proportion before analyzing, so that you have the proportion on an open-ended interval, which makes more sense for the correlation estimates.

Steve Denham

Contributor
Posts: 73

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Thanks again Steve!

I tried transforming the number of total eggs and he proportion of diseased eggs with log10, and dist=gaussian but did not work

ERROR: QUANEW Optimization cannot be completed

Optimization routine cannot improve the function value

Posts: 2,655

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Total number of eggs --> log transform (any base will do)

Proportion of diseased eggs --> logit transform = log ( proportion / (1 - proportion). This is the log of the odds ratio. Again any base will do, but usually natural logs are used.  If you use that, the QUANEW problem might be solved, and if not try: NLOPTIONS tech=nrridg;

Steve Denham

Contributor
Posts: 73

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

Oh noooo!!! now is the problem solved, but I only get positive corr values (I used log without any base)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1.0000 0.6982 0.4875 0.3404 0.2377 1.0000 0.1012 0.01025 0.001037 0.000105 0.6982 1.0000 0.6982 0.4875 0.3404 0.1012 1.0000 0.1012 0.01025 0.001037 0.4875 0.6982 1.0000 0.6982 0.4875 0.01025 0.1012 1.0000 0.1012 0.01025 0.3404 0.4875 0.6982 1.0000 0.6982 0.001037 0.01025 0.1012 1.0000 0.1012 0.2377 0.3404 0.4875 0.6982 1.0000 0.000105 0.001037 0.01025 0.1012 1.0000 1.0000 0.6982 0.4875 0.3404 0.2377 1.0000 0.1012 0.01025 0.001037 0.000105 0.6982 1.0000 0.6982 0.4875 0.3404 0.1012 1.0000 0.1012 0.01025 0.001037 0.4875 0.6982 1.0000 0.6982 0.4875 0.01025 0.1012 1.0000 0.1012 0.01025 0.3404 0.4875 0.6982 1.0000 0.6982 0.001037 0.01025 0.1012 1.0000 0.1012 0.2377 0.3404 0.4875 0.6982 1.0000 0.000105 0.001037 0.01025 0.1012 1.0000

the logit transformation for the proportions yielded some negative values.

Thank you Steve!

Posts: 2,655

## Re: testing for a negative correlation between 2 dependent variables in a repeated measures analysis

The logit should give some negative values, because if the proportion is less than 0.5, you have the log of a value less than one, which is negative.

Now, these correlations look reasonable to me--the higher the total egg count, the higher the proportion diseased.  Visual inspection of the data seems to support that.  You could get negative values, I think, by looking at the proportion healthy.

Steve Denham

Contributor
Posts: 73