Programming the statistical procedures from SAS

PROC CORR: "partial" statement in missing data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

PROC CORR: "partial" statement in missing data

I loose a lot of cases with missing value in this analysis!

 

I want to analyze correlation among my biomarkers, varA varB varC varD, controlling for varE.  My code is;

proc corr;

   var varA varB varC varD;

   partial varE;

run;

 

I have N=1500 in my data set, in which cases have at least one biomarker.  However, varA has 300 missing cases, varB has 500 missing cases, varC has 150 missing cases, and varE has 500 missing cases.  Some missing cases are overlapped but the others are not overlapped.  When I use the above code, there are only N=50.

 

How can I calculate, for example, a correlation between varA and varB controlling for varE using all available cases with varA, varB and varE data?

I actually have 50 biomarker variables in the correlation analysis.  It is too much to type all pair-wise analyses (1225 pairs) by hand!

 

Thank you in advance for your help!

 

Sincerely,

Rcore


Accepted Solutions
Solution
‎10-01-2015 04:26 PM
SAS Super FREQ
Posts: 3,637

Re: PROC CORR: "partial" statement in missing data

Here's the main idea: Run 50 single-variable regressions to generate the residuals when each biomarker is regressed on VarE.

Then compute the correlation of the residuals.  Here's some code for the case of three variables, using Weight as the PARTIAL variable:

/* run 50 linear regressions and save residuals */
ods graphics off;
proc reg data=Sashelp.Heart noprint;
model AgeAtStart = weight;
output out=Out1(keep=R rename=(R=AgeAtStart)) r=R;
quit;
/* ...etc...run 47 other analyses */
proc reg data=Sashelp.Heart noprint;
model Smoking = weight;
output out=Out2(keep=R rename=(R=Smoking)) r=R;
quit;
proc reg data=Sashelp.Heart noprint;
model Cholesterol = weight;
output out=Out3(keep=R rename=(R=Cholesterol)) r=R;
quit;

/* accumulate residuals */
data Residuals;
merge Out1-Out3;
run;

/* compute the pairwise correlations of the residuals */
proc corr data=Residuals nosimple noprob; /* pairwise missing */
run;

 

The hard part is to do those 50 one-variable regressions.  Fifty is not too many, so you can (1) cut-and-paste by hand, or (2) write a SAS macro.

 

There is a third option, (3) convert the data from wide to long and replicate the VarE variable and use a BY group approach to compute all 50 regressions with a single call to PROC REG.  For an example of converting wide to long and replicating the VarE variable, see the article "Plotting multiple time series."  Then you'd need to transpose back to wide form to use PROC CORR.

View solution in original post


All Replies
Solution
‎10-01-2015 04:26 PM
SAS Super FREQ
Posts: 3,637

Re: PROC CORR: "partial" statement in missing data

Here's the main idea: Run 50 single-variable regressions to generate the residuals when each biomarker is regressed on VarE.

Then compute the correlation of the residuals.  Here's some code for the case of three variables, using Weight as the PARTIAL variable:

/* run 50 linear regressions and save residuals */
ods graphics off;
proc reg data=Sashelp.Heart noprint;
model AgeAtStart = weight;
output out=Out1(keep=R rename=(R=AgeAtStart)) r=R;
quit;
/* ...etc...run 47 other analyses */
proc reg data=Sashelp.Heart noprint;
model Smoking = weight;
output out=Out2(keep=R rename=(R=Smoking)) r=R;
quit;
proc reg data=Sashelp.Heart noprint;
model Cholesterol = weight;
output out=Out3(keep=R rename=(R=Cholesterol)) r=R;
quit;

/* accumulate residuals */
data Residuals;
merge Out1-Out3;
run;

/* compute the pairwise correlations of the residuals */
proc corr data=Residuals nosimple noprob; /* pairwise missing */
run;

 

The hard part is to do those 50 one-variable regressions.  Fifty is not too many, so you can (1) cut-and-paste by hand, or (2) write a SAS macro.

 

There is a third option, (3) convert the data from wide to long and replicate the VarE variable and use a BY group approach to compute all 50 regressions with a single call to PROC REG.  For an example of converting wide to long and replicating the VarE variable, see the article "Plotting multiple time series."  Then you'd need to transpose back to wide form to use PROC CORR.

Occasional Contributor
Posts: 8

Re: PROC CORR: "partial" statement in missing data

This is really helpful and worked.  Thank you very much!!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 248 views
  • 2 likes
  • 2 in conversation