BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Rcore
Calcite | Level 5

I loose a lot of cases with missing value in this analysis!

 

I want to analyze correlation among my biomarkers, varA varB varC varD, controlling for varE.  My code is;

proc corr;

   var varA varB varC varD;

   partial varE;

run;

 

I have N=1500 in my data set, in which cases have at least one biomarker.  However, varA has 300 missing cases, varB has 500 missing cases, varC has 150 missing cases, and varE has 500 missing cases.  Some missing cases are overlapped but the others are not overlapped.  When I use the above code, there are only N=50.

 

How can I calculate, for example, a correlation between varA and varB controlling for varE using all available cases with varA, varB and varE data?

I actually have 50 biomarker variables in the correlation analysis.  It is too much to type all pair-wise analyses (1225 pairs) by hand!

 

Thank you in advance for your help!

 

Sincerely,

Rcore

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Here's the main idea: Run 50 single-variable regressions to generate the residuals when each biomarker is regressed on VarE.

Then compute the correlation of the residuals.  Here's some code for the case of three variables, using Weight as the PARTIAL variable:

/* run 50 linear regressions and save residuals */
ods graphics off;
proc reg data=Sashelp.Heart noprint;
model AgeAtStart = weight;
output out=Out1(keep=R rename=(R=AgeAtStart)) r=R;
quit;
/* ...etc...run 47 other analyses */
proc reg data=Sashelp.Heart noprint;
model Smoking = weight;
output out=Out2(keep=R rename=(R=Smoking)) r=R;
quit;
proc reg data=Sashelp.Heart noprint;
model Cholesterol = weight;
output out=Out3(keep=R rename=(R=Cholesterol)) r=R;
quit;

/* accumulate residuals */
data Residuals;
merge Out1-Out3;
run;

/* compute the pairwise correlations of the residuals */
proc corr data=Residuals nosimple noprob; /* pairwise missing */
run;

 

The hard part is to do those 50 one-variable regressions.  Fifty is not too many, so you can (1) cut-and-paste by hand, or (2) write a SAS macro.

 

There is a third option, (3) convert the data from wide to long and replicate the VarE variable and use a BY group approach to compute all 50 regressions with a single call to PROC REG.  For an example of converting wide to long and replicating the VarE variable, see the article "Plotting multiple time series."  Then you'd need to transpose back to wide form to use PROC CORR.

View solution in original post

2 REPLIES 2
Rick_SAS
SAS Super FREQ

Here's the main idea: Run 50 single-variable regressions to generate the residuals when each biomarker is regressed on VarE.

Then compute the correlation of the residuals.  Here's some code for the case of three variables, using Weight as the PARTIAL variable:

/* run 50 linear regressions and save residuals */
ods graphics off;
proc reg data=Sashelp.Heart noprint;
model AgeAtStart = weight;
output out=Out1(keep=R rename=(R=AgeAtStart)) r=R;
quit;
/* ...etc...run 47 other analyses */
proc reg data=Sashelp.Heart noprint;
model Smoking = weight;
output out=Out2(keep=R rename=(R=Smoking)) r=R;
quit;
proc reg data=Sashelp.Heart noprint;
model Cholesterol = weight;
output out=Out3(keep=R rename=(R=Cholesterol)) r=R;
quit;

/* accumulate residuals */
data Residuals;
merge Out1-Out3;
run;

/* compute the pairwise correlations of the residuals */
proc corr data=Residuals nosimple noprob; /* pairwise missing */
run;

 

The hard part is to do those 50 one-variable regressions.  Fifty is not too many, so you can (1) cut-and-paste by hand, or (2) write a SAS macro.

 

There is a third option, (3) convert the data from wide to long and replicate the VarE variable and use a BY group approach to compute all 50 regressions with a single call to PROC REG.  For an example of converting wide to long and replicating the VarE variable, see the article "Plotting multiple time series."  Then you'd need to transpose back to wide form to use PROC CORR.

Rcore
Calcite | Level 5

This is really helpful and worked.  Thank you very much!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1779 views
  • 2 likes
  • 2 in conversation