Solved: Re: How to analysis the potential correlation for two different variab...

Dennisky · Posted 02-16-2023 12:54 AM

Dear all,

We plan to conduct a study about a heart congenital disease.

All of the patients have received the same surgical procedure.

The ultrasound detection might be very important and everyone was received the US detection at 3 days preoperative, 7 days and one month after the operation, respectively (three time point). We can find the key measurement indicator of the anatomy of the heart by US is become better and better.

Of cause, this is a triumph of surgery.

However, now we want to demonstrate the diagnostic value of the US detection. There is a score of cardiac function and we also collect the score at the 3 days preoperative, 7 days and one month after the operation, respectively.

We suspected that if we found the association or correlation (or something else) between the two variables (the US measurement indicator and the key score of cardiac function), which might the demonstrate the diagnostic value of the US.

Am I right?

It seems to be two repeated measurement factors design.

And how to conduct the analysis for exploring the potential association or correlated between the two variables?

WX20230216-134751@2x.png

StatDave · Posted 02-19-2023 12:47 PM

The method used in GEE to estimate the exchangeable correlation is shown in the Details:Generalized Estimating Equations section of the PROC GEE documentation. However, on further reflection, I don't think the GEE approach works to estimate the correlation between your two variables. Essentially, you want to estimate their correlation after adjusting for the effects of subjects and time. This suggests to me using partial correlation. It can be easily done in PROC CORR with the PARTIAL statement. And a confidence interval is also available. The subject and time variables would be specified in the PARTIAL statement.

The data are arranged as one observation per subject-time combination, with variables indicating subject and time as well as your two variables:

id t us cardiac

1 1 2.1 3

1 2 2.3 5

1 3 2.4 6

2 1 1.6 2

...

But since ID is essentially a categorical variable, you'll need to represent it in the PARTIAL statement with a set of dummy variables (like what the CLASS statement does in other procedures). There is no CLASS statement in PROC CORR, but you can produce the dummy variables in various ways as discussed in this note. Using the PROC LOGISTIC method, the following generates the dummy variables for subjects. The form of the model isn't particularly important. No displayed output is produced but the data set OD is created with all of the variables needed for PROC CORR. The dummy variables for ID are created by the CLASS statement and saved using names ID1, ID2, and so on.

proc logistic data=mydata outdesign=od outdesignonly; 
  class id/ param=ref; 
  model us=cardiac id t; 
  run;

Then PROC CORR can be used to get the desired correlation. Note the colon (:) following ID. This syntax tells the PARTIAL statement to include all variables whose name begins with ID. This will then include all of the created dummy variables as well as the time variable, t. The FISHER option provides a confidence interval for the correlation by applying the Fisher z transformation. If your two variables are very nonnormal in distribution, then you might want to include the SPEARMAN option to use Spearman correlations instead of Pearson.

proc corr data=od fisher nosimple; 
  var us; with cardiac; partial t id: ; 
  run;

View solution in original post

StatDave · Posted 02-16-2023 06:38 PM

You will have to decide on the most appropriate statistical method for your goals, but one approach you could consider is to fit a Generalized Estimating Equations model to accommodate the repeated measures within subjects and use an exchangeable structure of the intrasubject correlations. Assuming you have three observations for each subject with a subject number (ID) variable, a US variable, and a cardiac score variable, then the following fits a GEE model and will provide a table giving the correlation estimate. One issue is that it might not be possible to say that one variable is the response and one is the predictor, so the model could be fit either way. Also, this code assumes that the variable selected as the response is normally distributed, but a different distribution could be specified if needed.

proc gee; 
class id; 
model score=us; 
repeated subject=id/type=exch; 
run;

Dennisky · Posted 02-16-2023 08:17 PM

Thank you very much for your great advices.

Should we consider the time factor (three time ponit in our study) in the model?

And how to arrange this data formats

StatDave · Posted 02-16-2023 10:53 PM

As I already mentioned, the time factor is taken care of by the GEE method which accounts for the repeated measures done within subjects. As for the data format, I described that before - three observations per subject, one for each time. In each you record the subject number and the value on each of your two variables.

subject US cardiac score

1 2.1 3

1 2.3 5

1 2.4 6

2 1.6 2

....

StatDave · Posted 02-17-2023 09:38 AM

On reflection, the data structure and model I suggested is not correct. It only computes the correlation among the three values of the variable selected as the response variable and not among the two variables that you have. While there is probably a better statistical approach, the problem with the previous analysis might be corrected by creating 6 observations for each subject, instead of three, and adding a time variable in the data.

time subject y

1 1 2.1

1 1 3

2 1 2.3

2 1 5

3 1 2.4

3 1 6

1 2 1.6

1 2 2

...

Then an intercept-only GEE model can be fit as follows. The estimated correlation is then a correlation between the US and score variables accumulated over the clusters which are defined as each pair of Y values from a subject at one time. It still assumes the variables are normally distributed (though a different distribution could be specified). And it only provides a point estimate of the correlation with no standard error or confidence interval.

proc gee; 
class id time; 
model y=; 
repeated subject=id*time/type=exch; 
run;

Dennisky · Posted 02-19-2023 01:44 AM

Thanks a lot! We are very interesting about the intercept-only GEE model.

Notably, as you metioned that it only provides a point estimate of the correlation with no standard error or confidence interval. The result is determined by the algorithms？

StatDave · Posted 02-19-2023 12:47 PM

The method used in GEE to estimate the exchangeable correlation is shown in the Details:Generalized Estimating Equations section of the PROC GEE documentation. However, on further reflection, I don't think the GEE approach works to estimate the correlation between your two variables. Essentially, you want to estimate their correlation after adjusting for the effects of subjects and time. This suggests to me using partial correlation. It can be easily done in PROC CORR with the PARTIAL statement. And a confidence interval is also available. The subject and time variables would be specified in the PARTIAL statement.

The data are arranged as one observation per subject-time combination, with variables indicating subject and time as well as your two variables:

id t us cardiac

1 1 2.1 3

1 2 2.3 5

1 3 2.4 6

2 1 1.6 2

...

But since ID is essentially a categorical variable, you'll need to represent it in the PARTIAL statement with a set of dummy variables (like what the CLASS statement does in other procedures). There is no CLASS statement in PROC CORR, but you can produce the dummy variables in various ways as discussed in this note. Using the PROC LOGISTIC method, the following generates the dummy variables for subjects. The form of the model isn't particularly important. No displayed output is produced but the data set OD is created with all of the variables needed for PROC CORR. The dummy variables for ID are created by the CLASS statement and saved using names ID1, ID2, and so on.

proc logistic data=mydata outdesign=od outdesignonly; 
  class id/ param=ref; 
  model us=cardiac id t; 
  run;

Then PROC CORR can be used to get the desired correlation. Note the colon (:) following ID. This syntax tells the PARTIAL statement to include all variables whose name begins with ID. This will then include all of the created dummy variables as well as the time variable, t. The FISHER option provides a confidence interval for the correlation by applying the Fisher z transformation. If your two variables are very nonnormal in distribution, then you might want to include the SPEARMAN option to use Spearman correlations instead of Pearson.

proc corr data=od fisher nosimple; 
  var us; with cardiac; partial t id: ; 
  run;

Dennisky · Posted 02-20-2023 03:50 AM

Thank you so mach for your professional and rigorous attitude of scientific research. Your suggestion for using partial correlation and conducted by PROC CORR with the PARTIAL statement is so amazing. Everything will become clear. Yeasterday, I have read a paper titled "Comparing Generalized Estimating Equation and Linear Mixed Effects Model for Estimating Marginal Association with Bivariate Continuous Outcomes"( https://doi.org/10.1080/09286586.2022.2098984）.

The athours described that one analytical complication in ophthalmology studies is the presence of bivariate outcomes due to the bilateral nature of eyes. And they compares GEE and LMEM performance for bivariate continuous outcomes which are common in eye studies.

Can we conduct our study according to their method in the publiocation?

Of cause, I might misundertstand about the method.

SteveDenham · Posted 02-22-2023 02:53 PM

This paper served as the source for our current analysis of ocular data. Consider a design of 4 treatment levels, 6 time points and measures taken on each eye of an animal. Given that, you might consider the following approach using PROC GLIMMIX:

proc glimmix data=eye_data;
class treatment time eye subject_id;
model response = treatment|time/ddfm=kr2;
random eye/subject=subject_id type=un;
random time/residual subject=eye(subject_id) type=ar(1);
lsmeans <means of interest>;
lsmestimate <comparisons of interest>/adjust=simulate(seed=1) adjdfe=row;
run;

There is a lot to unpack here, but the critical assumption is that eye is a repeated measure on the subject, but only adds variability to the fixed effects. By specifying type=un, you will get a measure of the variance due to each eye and the covariance between them. GLIMMIX really does not care for doubly repeated measures on R side variables, so this has been our approach.

Using the Kronecker product covariance structure in PROC MIXED is another possibility. This code may be useful:

proc mixed data=eye_data;
class treatment time eye subject_id;
model response = treatment|teye|time/ddfm=kr2;
repeated eye time/ subject=subject_id type=un@ar(1);
lsmeans <means of interest>;
lsmestimate <comparisons of interest>/adjust=simulate(seed=1) adjdfe=row;
run;

The final thing I wanted to thrrow in there is that Dunnett's adjustment is really directed to comparison to a single control group mean, and that is why we tend to use Edwards and Berry's simulate method. If you ever have the Dunnett-Hsu method fail to converge (which may be the case for repeated measures designs), there is a message in the log to consider the adjust=simulate method.

SteveDenham

Dennisky · Posted 03-16-2023 03:19 AM

Thanks a lot !

How to analysis the potential correlation for two different variables at multi-time-point?

Re: How to analysis the potential correlation for two different variables at multi-time-point?

Re: How to analysis the potential correlation for two different variables at multi-time-point?

Re: How to analysis the potential correlation for two different variables at multi-time-point?

Re: How to analysis the potential correlation for two different variables at multi-time-point?

Re: How to analysis the potential correlation for two different variables at multi-time-point?

Re: How to analysis the potential correlation for two different variables at multi-time-point?

Re: How to analysis the potential correlation for two different variables at multi-time-point?

Re: How to analysis the potential correlation for two different variables at multi-time-point?

Re: How to analysis the potential correlation for two different variables at multi-time-point?

Re: How to analysis the potential correlation for two different variables at multi-time-point?