turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- PROC CORR: "partial" statement in missing data

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-08-2015 01:50 PM

I loose a lot of cases with missing value in this analysis!

I want to analyze correlation among my biomarkers, varA varB varC varD, controlling for varE. My code is;

proc corr;

var varA varB varC varD;

partial varE;

run;

I have N=1500 in my data set, in which cases have at least one biomarker. However, varA has 300 missing cases, varB has 500 missing cases, varC has 150 missing cases, and varE has 500 missing cases. Some missing cases are overlapped but the others are not overlapped. When I use the above code, there are only N=50.

How can I calculate, for example, a correlation between varA and varB controlling for varE using __all available cases__ with varA, varB and varE data?

I actually have 50 biomarker variables in the correlation analysis. It is too much to type all pair-wise analyses (1225 pairs) by hand!

Thank you in advance for your help!

Sincerely,

Rcore

Accepted Solutions

Solution

10-01-2015
04:26 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-08-2015 03:25 PM

Here's the main idea: Run 50 single-variable regressions to generate the residuals when each biomarker is regressed on VarE.

Then compute the correlation of the residuals. Here's some code for the case of three variables, using Weight as the PARTIAL variable:

```
/* run 50 linear regressions and save residuals */
ods graphics off;
proc reg data=Sashelp.Heart noprint;
model AgeAtStart = weight;
output out=Out1(keep=R rename=(R=AgeAtStart)) r=R;
quit;
/* ...etc...run 47 other analyses */
proc reg data=Sashelp.Heart noprint;
model Smoking = weight;
output out=Out2(keep=R rename=(R=Smoking)) r=R;
quit;
proc reg data=Sashelp.Heart noprint;
model Cholesterol = weight;
output out=Out3(keep=R rename=(R=Cholesterol)) r=R;
quit;
/* accumulate residuals */
data Residuals;
merge Out1-Out3;
run;
/* compute the pairwise correlations of the residuals */
proc corr data=Residuals nosimple noprob; /* pairwise missing */
run;
```

The hard part is to do those 50 one-variable regressions. Fifty is not too many, so you can (1) cut-and-paste by hand, or (2) write a SAS macro.

There is a third option, (3) convert the data from wide to long and replicate the VarE variable and use a BY group approach to compute all 50 regressions with a single call to PROC REG. For an example of converting wide to long and replicating the VarE variable, see the article "Plotting multiple time series." Then you'd need to transpose back to wide form to use PROC CORR.

All Replies

Solution

10-01-2015
04:26 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-08-2015 03:25 PM

Here's the main idea: Run 50 single-variable regressions to generate the residuals when each biomarker is regressed on VarE.

Then compute the correlation of the residuals. Here's some code for the case of three variables, using Weight as the PARTIAL variable:

```
/* run 50 linear regressions and save residuals */
ods graphics off;
proc reg data=Sashelp.Heart noprint;
model AgeAtStart = weight;
output out=Out1(keep=R rename=(R=AgeAtStart)) r=R;
quit;
/* ...etc...run 47 other analyses */
proc reg data=Sashelp.Heart noprint;
model Smoking = weight;
output out=Out2(keep=R rename=(R=Smoking)) r=R;
quit;
proc reg data=Sashelp.Heart noprint;
model Cholesterol = weight;
output out=Out3(keep=R rename=(R=Cholesterol)) r=R;
quit;
/* accumulate residuals */
data Residuals;
merge Out1-Out3;
run;
/* compute the pairwise correlations of the residuals */
proc corr data=Residuals nosimple noprob; /* pairwise missing */
run;
```

The hard part is to do those 50 one-variable regressions. Fifty is not too many, so you can (1) cut-and-paste by hand, or (2) write a SAS macro.

There is a third option, (3) convert the data from wide to long and replicate the VarE variable and use a BY group approach to compute all 50 regressions with a single call to PROC REG. For an example of converting wide to long and replicating the VarE variable, see the article "Plotting multiple time series." Then you'd need to transpose back to wide form to use PROC CORR.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-01-2015 04:27 PM

This is really helpful and worked. Thank you very much!!