Hi,
I am trying to understand if there are any difference between pairwise and list wise deletion (using nomiss option) in proc corr - if my data set only includes two variables (A and B).
Variable A - 10 non-missing values
Variable B - 12 non-missing values
Will the correlation coefficient be calculated only based on 10 pairs of non-missing values?
When I plot these two variables on a scatter plot, I only see ten data points. Is there a way to include all 12 data points?
Thank you.
If you're using PROC CORR to compute just one correlation coefficient the NOMISS option will have no effect.
As described in the Missing Values section of the PROC CORR documentation, by default the procedure uses pairwise deletion. So when computing the correlation between A and B it will use all observations where *both* A and B are nonmissing. When the NOMISS option is specified and listwise deletion is used, any observation with a missing value for any analysis value is excluded. When you have just two analysis variables listwise and pairwise deletion are the same since that pair of variables is the whole list of analysis variables.
To make graph of any type you need the coordinates. In a two dimensional plot that means an X and Y value.
Where do you expect the "missing" coordinate to be plotted?
When you use NOMISS with proc corr than any observation that has any missing analysis variables (all numeric by default or only those listed on VAR/WITH statements) are excluded.
Here is an example to demonstrate the difference.
data example; set sashelp.class; if name='Alice' then age=.; run; Proc corr data=example; title 'Default'; run; title; Proc corr data=example nomiss; title 'With NOMIISS'; run; title;
You will see that where pairs of variables exist the correlation is calculated. So you get some calculations with 18 and some with 19 observations. In the second with the NOMISS option only the 18 records that have no missing values for any of age, height or weight (the numeric variables in the sashelp.class data set) are used.
Regardless of option any time that any of the variables of interest are missing the pair is not used in the correlation.
If you're using PROC CORR to compute just one correlation coefficient the NOMISS option will have no effect.
As described in the Missing Values section of the PROC CORR documentation, by default the procedure uses pairwise deletion. So when computing the correlation between A and B it will use all observations where *both* A and B are nonmissing. When the NOMISS option is specified and listwise deletion is used, any observation with a missing value for any analysis value is excluded. When you have just two analysis variables listwise and pairwise deletion are the same since that pair of variables is the whole list of analysis variables.
For three or more variables, you can construct examples where listwise and pairwise deletion give different answers. For an example, and for an explanation of why SAS regression and multivariate procedures use listwise deletion, see "Missing values and pairwise correlations: A cautionary example."
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.