BookmarkSubscribeRSS Feed
Luke01
Fluorite | Level 6

Hi community,

 

Im having trouble trying to calculate Concordance Correlation Coefficient. How can i do this?

 

I am aware of: https://newonlinecourses.science.psu.edu/stat509/node/161/

 

but it seems to have old procedures that are not relevent for more recent SAS versions.

 

thanks in advance.

9 REPLIES 9
Ksharp
Super User

Did you try the FOUR kind of correlation coefficient in PROC CORR.

 

proc corr data=sashelp.class kendall;
var weight height;
run;
Luke01
Fluorite | Level 6

I am after lin's concordance correlation coefficient, not the same as correlation in terms of pearson's etc.

 

ie y=x.

PaigeMiller
Diamond | Level 26

@Luke01 look at the link I gave you in your other thread ... it was meant as a reference for you to refer back to as needed, not as a one time thing.

--
Paige Miller
Luke01
Fluorite | Level 6

yes it refers to kendels, i am after Lins.

FreelanceReinh
Jade | Level 19

Hi @Luke01,

 

You can get the all the terms used in the formula from PROC CORR: Just use the COV and OUTP= options. The rest is a simple calculation, e.g., in a DATA step.

 

Example (assuming a dataset HAVE with numeric variables X and Y with only non-missing values):

proc corr data=have cov outp=stats noprint;
var x y;
run;

data want(keep=rc);
do until(last);
  set stats end=last;
  sxx+(_type_='COV')*(upcase(_name_)='X')*x;
  syy+(_type_='COV')*(upcase(_name_)='Y')*y;
  sxy+(_type_='COV')*(upcase(_name_)='X')*y;
  mx +(_type_='MEAN')*x;
  my +(_type_='MEAN')*y;
  n  +(_type_='N')*x;
end;
rc=2*sxy/(sxx+syy+(mx-my)**2);
run;

RC is the concordance correlation coefficient as per the link you provided.

Luke01
Fluorite | Level 6

Sorry for my ignorance im new to SAS and coding etc.

 

But what does sxx, my etc refer to?

 

Do i need to sub my x and y variables in?

 

Same for _name_

 

Just as I am getting 0 for rc which isnt right. Below is an example of what I did.

 

proc corr data=HAVE cov outp=stats noprint;
var x_variable y_variable;
run;

data want(keep=rc);
do until(last);
  set stats end=last;
  sxx+(_type_='COV')*(upcase(_name_)='X')*x_variable;
  syy+(_type_='COV')*(upcase(_name_)='Y')*y_variable;
  sxy+(_type_='COV')*(upcase(_name_)='X')*y_variable;
  mx +(_type_='MEAN')*x_variable;
  my +(_type_='MEAN')*y_variable;
  n  +(_type_='N')*x_variable;
end;
rc=2*sxy/(sxx+syy+(mx-my)*2);
run;

 

thank you

FreelanceReinh
Jade | Level 19

@Luke01 wrote:

But what does sxx, my etc refer to?


These are just arbitrary variable names. I used names reflecting the terms in the formula (e.g. sxx for SXX ). These variables are not kept in the output dataset (see the KEEP= option in the DATA statement -- but you may change that if you like), so their names don't really matter.


So i need to sub my x and y variables in?


You need to replace the dataset name have in the PROC CORR step with the name of your dataset. If the variables in your dataset for which you want to compute the concordance correlation coefficient are not named x and y, you would ideally use a RENAME= dataset option in the PROC CORR step. So, using the example program (see the link on the web page you linked to in your initial post) the PROC CORR step would read:

proc corr data=dice_baseline(rename=(cort_auc1=x cort_auc2=y)) cov outp=stats noprint;
var x y;
run;

(where stats is an arbitrary dataset name -- in case of a name conflict just use a different name).

 

Alternatively, let PROC CORR work with the original names and then replace "x" and "y" by these names in the DATA step in nine places, not only six as you did in your code: The character constants 'X' and 'Y' must also be replaced by 'X_VARIABLE' and 'Y_VARIABLE' (upper case is mandatory), respectively.

 

The DATA step reads the output dataset stats produced by PROC CORR, computes the concordance correlation coefficient and stores it in a numeric variable named rc in a dataset want containing only one record. (Both rc and want are arbitrary names.) 


Same for _name_


The output dataset from PROC CORR contains character variables _type_ and _name_ (these are default names) which contain names of statistics and variables, respectively, and are used in the DATA step. I wrote the DATA step looking at PROC PRINT output of dataset stats:

proc print data=stats;
run;

 

tka726
Obsidian | Level 7

This worked perfectly for me, thank you!! Can the confidence interval for the CCC be obtained from this output as well?

 

 

FreelanceReinh
Jade | Level 19

Hello @tka726,

 

Yes, in principle this is possible (e.g., the Pearson correlation coefficient from the PROC CORR output dataset stats would be involved in the calculation; also the TANH and ARTANH functions could be applied to simplify some terms). However, I found a couple of discrepancies between the formula used in the IML code from the psu.edu website linked by @Luke01 and a formula I saw elsewhere -- whereas the formulas for the point estimate were equivalent. Therefore, I'm hesitant to publish SAS code for the confidence interval, unless you provide a (link to a) reliable formula for it in mathematical notation like the formula for the point estimate on the psu.edu website.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 4715 views
  • 0 likes
  • 5 in conversation