Solved: What is the difference between correlation estimates from PROC COPULA ...

hengzh47 · Posted 08-12-2020 09:39 PM

Hi

I am trying to understand the difference between correlation estimates from different these two procedures, the documentation on PROC COPULA doesn't provide much info on the estimate part.

1. I understand that PROC COPULA correlation matrix is using rank correlation, which should be the same as Spearman Correlation. The results contains both "Correlation Matrix" and "Spearman Correlation Matrix", they are close, but not exactly the same, why ?

2. By using PROC CORR on the same dataset, the results Spearman Correlation is quite different from the result from PROC COPULA. So the same "Spearman Correlation" returns different results in different procedure?

I use sashelp data as an example.

Any insight is appreciated.

proc copula data=sashelp.cars;
var Cylinders EngineSize Length;
 fit normal;
 simulate / ndraws=5000
 SEED=1234
 out=work.copula_data;
 run;

 proc corr data=sashelp.cars noprob nosimple spearman;
var Cylinders EngineSize Length;
run;

Rick_SAS · Posted 08-13-2020 07:01 AM

The PROC CORR output is for the data. Be sure to use the NOMISS option to drop observations that have missing values.

As stated in the PROC COPULA doc, the correlation matrix "contains the estimates of the model correlation matrix." So these are the MLE estimates that you get from fitting the normal model to the data.

If you simulate a lot of data from the copula model, the Spearman correlation of the simulated data should be close to the Spearman correlation of the model, as shown in the following statements:

ods trace on;

/* standard estimates from data */
proc corr data=sashelp.cars noprob nosimple pearson spearman NOMISS;
var Cylinders EngineSize Length;
ods exclude VarInformation;
run;

/* MLE estimates from normal copula model */
proc copula data=sashelp.cars;
var Cylinders EngineSize Length;
fit normal;
simulate / ndraws=100000
 SEED=1234
 out=work.copula_data;
ods exclude FitSummary KendallCorrelation;
run;

/* the Spearman corr of the simulated data should be close to the 
   fitted Spearman corr */
proc corr data=work.copula_data noprob nosimple spearman NOMISS;
var Cylinders EngineSize Length;
ods exclude VarInformation;
run;

View solution in original post

Rick_SAS · Posted 08-13-2020 07:01 AM

The PROC CORR output is for the data. Be sure to use the NOMISS option to drop observations that have missing values.

As stated in the PROC COPULA doc, the correlation matrix "contains the estimates of the model correlation matrix." So these are the MLE estimates that you get from fitting the normal model to the data.

If you simulate a lot of data from the copula model, the Spearman correlation of the simulated data should be close to the Spearman correlation of the model, as shown in the following statements:

ods trace on;

/* standard estimates from data */
proc corr data=sashelp.cars noprob nosimple pearson spearman NOMISS;
var Cylinders EngineSize Length;
ods exclude VarInformation;
run;

/* MLE estimates from normal copula model */
proc copula data=sashelp.cars;
var Cylinders EngineSize Length;
fit normal;
simulate / ndraws=100000
 SEED=1234
 out=work.copula_data;
ods exclude FitSummary KendallCorrelation;
run;

/* the Spearman corr of the simulated data should be close to the 
   fitted Spearman corr */
proc corr data=work.copula_data noprob nosimple spearman NOMISS;
var Cylinders EngineSize Length;
ods exclude VarInformation;
run;

hengzh47 · Posted 08-13-2020 12:05 PM

Hi Rick

Thanks for the explanation for the PROC COPULA results. It is true the estimated correlation matrix is close to correlation calculated from simulated multivariate variables, but the simulated variable correlation are not quite close to the correlation calculated from original dataset. I am having problem to understand the difference.

Another question I am trying to figure out is when simulating in PROC COPULA, which correlation matrix is used? It seems Spearman correlation is used. While the other "Correlation Matrix" in the result is close to Spearman, how is it calculated?

thanks

Heng

Rick_SAS · Posted 08-13-2020 01:48 PM

> the simulated variable correlation are not quite close to the correlation calculated from original dataset

Yes. The original data is not even close to being multivariate normal. The model does not fit the data.

> when simulating in PROC COPULA, which correlation matrix is used

Spearman

It sounds like you are trying to understand copulas better before you start using them for modeling. The math is not simple. When I looked at copulas a number of years ago, I did not find any simple presentations, so I wrote one for my 2013 book, Simulating Data with SAS (Section 9.5). Maybe some good intros have been written in the intervening years. If you find a good introduction, I'd be interested in learning about it.

For learning about the procedure, there is the PROC COPULA documentation and the 2011 SGF paper by Chvosta, Erdman, Little, which introduces PROC COPULA.

What is the difference between correlation estimates from PROC COPULA and PROC CORR?

Re: What is the difference between correlation estimates from PROC COPULA and PROC CORR?

Re: What is the difference between correlation estimates from PROC COPULA and PROC CORR?

Re: What is the difference between correlation estimates from PROC COPULA and PROC CORR?

Re: What is the difference between correlation estimates from PROC COPULA and PROC CORR?

What is the difference between correlation estimates from PROC COPULA and PROC CORR?

Re: What is the difference between correlation estimates from PROC COPULA and PROC CORR?

Re: What is the difference between correlation estimates from PROC COPULA and PROC CORR?

Re: What is the difference between correlation estimates from PROC COPULA and PROC CORR?

Re: What is the difference between correlation estimates from PROC COPULA and PROC CORR?

SAS Innovate 2025: Save the Date