BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
analyst_work
Obsidian | Level 7

Hi, 

I am trying to understand if there are any difference between pairwise and list wise deletion (using nomiss option) in proc corr - if my data set only includes two variables (A and B). 

Variable A - 10 non-missing values

Variable B - 12 non-missing values

Will the correlation coefficient be calculated only based on 10 pairs of non-missing values? 

When I plot these two variables on a scatter plot, I only see ten data points. Is there a way to include all 12 data points?

 

Thank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
MichaelL_SAS
SAS Employee

If you're using PROC CORR to compute just one correlation coefficient the NOMISS option will have no effect.

 

As described in the Missing Values section of the PROC CORR documentation, by default the procedure uses pairwise deletion. So when computing the correlation between A and B it will use all observations where *both* A and B are nonmissing. When the NOMISS option is specified and listwise deletion is used, any observation with a missing value for any analysis value is excluded. When you have just two analysis variables listwise and pairwise deletion are the same since that pair of variables is the whole list of analysis variables. 

View solution in original post

4 REPLIES 4
ballardw
Super User

To make graph of any type you need the coordinates. In a two dimensional plot that means an X and Y value.

Where do you expect the "missing" coordinate to be plotted?

 

When you use NOMISS with proc corr than any observation that has any missing analysis variables (all numeric by default or only those listed on VAR/WITH statements) are excluded.

Here is an example to demonstrate the difference.

data example;
   set sashelp.class;
   if name='Alice' then age=.;
run;

Proc corr data=example;
   title 'Default';
run; title;

Proc corr data=example nomiss;
   title 'With NOMIISS';

run; title;

You will see that where pairs of variables exist the correlation is calculated. So you get some calculations with 18 and some with 19 observations. In the second with the NOMISS option only the 18 records that have no missing values for any of age, height or weight (the numeric variables in the sashelp.class data set) are used.

 

Regardless of option any time that any of the variables of interest are missing the pair is not used in the correlation.

 

Reeza
Super User
Set the missing to 0 to graph it.
MichaelL_SAS
SAS Employee

If you're using PROC CORR to compute just one correlation coefficient the NOMISS option will have no effect.

 

As described in the Missing Values section of the PROC CORR documentation, by default the procedure uses pairwise deletion. So when computing the correlation between A and B it will use all observations where *both* A and B are nonmissing. When the NOMISS option is specified and listwise deletion is used, any observation with a missing value for any analysis value is excluded. When you have just two analysis variables listwise and pairwise deletion are the same since that pair of variables is the whole list of analysis variables. 

Rick_SAS
SAS Super FREQ

For three or more variables, you can construct examples where listwise and pairwise deletion give different answers. For an example, and for an explanation of why SAS regression and multivariate procedures use listwise deletion, see "Missing values and pairwise correlations: A cautionary example."

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1735 views
  • 3 likes
  • 5 in conversation