Hi
I am trying to understand the difference between correlation estimates from different these two procedures, the documentation on PROC COPULA doesn't provide much info on the estimate part.
1. I understand that PROC COPULA correlation matrix is using rank correlation, which should be the same as Spearman Correlation. The results contains both "Correlation Matrix" and "Spearman Correlation Matrix", they are close, but not exactly the same, why ?
2. By using PROC CORR on the same dataset, the results Spearman Correlation is quite different from the result from PROC COPULA. So the same "Spearman Correlation" returns different results in different procedure?
I use sashelp data as an example.
Any insight is appreciated.
proc copula data=sashelp.cars;
var Cylinders EngineSize Length;
fit normal;
simulate / ndraws=5000
SEED=1234
out=work.copula_data;
run;
proc corr data=sashelp.cars noprob nosimple spearman;
var Cylinders EngineSize Length;
run;
The PROC CORR output is for the data. Be sure to use the NOMISS option to drop observations that have missing values.
As stated in the PROC COPULA doc, the correlation matrix "contains the estimates of the model correlation matrix." So these are the MLE estimates that you get from fitting the normal model to the data.
If you simulate a lot of data from the copula model, the Spearman correlation of the simulated data should be close to the Spearman correlation of the model, as shown in the following statements:
ods trace on;
/* standard estimates from data */
proc corr data=sashelp.cars noprob nosimple pearson spearman NOMISS;
var Cylinders EngineSize Length;
ods exclude VarInformation;
run;
/* MLE estimates from normal copula model */
proc copula data=sashelp.cars;
var Cylinders EngineSize Length;
fit normal;
simulate / ndraws=100000
SEED=1234
out=work.copula_data;
ods exclude FitSummary KendallCorrelation;
run;
/* the Spearman corr of the simulated data should be close to the
fitted Spearman corr */
proc corr data=work.copula_data noprob nosimple spearman NOMISS;
var Cylinders EngineSize Length;
ods exclude VarInformation;
run;
The PROC CORR output is for the data. Be sure to use the NOMISS option to drop observations that have missing values.
As stated in the PROC COPULA doc, the correlation matrix "contains the estimates of the model correlation matrix." So these are the MLE estimates that you get from fitting the normal model to the data.
If you simulate a lot of data from the copula model, the Spearman correlation of the simulated data should be close to the Spearman correlation of the model, as shown in the following statements:
ods trace on;
/* standard estimates from data */
proc corr data=sashelp.cars noprob nosimple pearson spearman NOMISS;
var Cylinders EngineSize Length;
ods exclude VarInformation;
run;
/* MLE estimates from normal copula model */
proc copula data=sashelp.cars;
var Cylinders EngineSize Length;
fit normal;
simulate / ndraws=100000
SEED=1234
out=work.copula_data;
ods exclude FitSummary KendallCorrelation;
run;
/* the Spearman corr of the simulated data should be close to the
fitted Spearman corr */
proc corr data=work.copula_data noprob nosimple spearman NOMISS;
var Cylinders EngineSize Length;
ods exclude VarInformation;
run;
Hi Rick
Thanks for the explanation for the PROC COPULA results. It is true the estimated correlation matrix is close to correlation calculated from simulated multivariate variables, but the simulated variable correlation are not quite close to the correlation calculated from original dataset. I am having problem to understand the difference.
Another question I am trying to figure out is when simulating in PROC COPULA, which correlation matrix is used? It seems Spearman correlation is used. While the other "Correlation Matrix" in the result is close to Spearman, how is it calculated?
thanks
Heng
> the simulated variable correlation are not quite close to the correlation calculated from original dataset
Yes. The original data is not even close to being multivariate normal. The model does not fit the data.
> when simulating in PROC COPULA, which correlation matrix is used
Spearman
It sounds like you are trying to understand copulas better before you start using them for modeling. The math is not simple. When I looked at copulas a number of years ago, I did not find any simple presentations, so I wrote one for my 2013 book, Simulating Data with SAS (Section 9.5). Maybe some good intros have been written in the intervening years. If you find a good introduction, I'd be interested in learning about it.
For learning about the procedure, there is the PROC COPULA documentation and the 2011 SGF paper by Chvosta, Erdman, Little, which introduces PROC COPULA.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.