I am trying to do a correlation matrix between the missingness of different variables. For instance, how often/what's the correlation that gender and ethnicity are missing together. How would I go about this? I know how to do a correlation matrix between variables, but a little stuck on how to compare the missingness of each variable. Thanks!
I think you are looking at association not correlation. Correlation is a specific calculation.
One way may be to look at distribution such as in proc freq. Create a format for non-missing / missing.
An example:
proc format ; value nm . = 'Missing' other='Not missing' ; /*if the variable is charcter*/ value $nm ' '= 'Missing' other='Not missing' ; data example; input age sex $; datalines; 25 f 21 m . f . f 25 . 30 . 35 . ; proc freq data=example; tables age*sex / missing; format age nm. sex $nm. ; run;
The above is the examine the data. You could do a chi-square test for actual difference of distribution, some difference or odds ratios and relative risks as well.
Or don't format the variables if they have a small number of values and just use the MISSING option in the Tables statement so the missing values are considered.
I think you are looking at association not correlation. Correlation is a specific calculation.
One way may be to look at distribution such as in proc freq. Create a format for non-missing / missing.
An example:
proc format ; value nm . = 'Missing' other='Not missing' ; /*if the variable is charcter*/ value $nm ' '= 'Missing' other='Not missing' ; data example; input age sex $; datalines; 25 f 21 m . f . f 25 . 30 . 35 . ; proc freq data=example; tables age*sex / missing; format age nm. sex $nm. ; run;
The above is the examine the data. You could do a chi-square test for actual difference of distribution, some difference or odds ratios and relative risks as well.
Or don't format the variables if they have a small number of values and just use the MISSING option in the Tables statement so the missing values are considered.
ods select MissPattern;
proc mi data=Sashelp.Heart nimpute=0;
var AgeAtStart Height Weight Diastolic
Systolic MRW Smoking Cholesterol;
run;
https://blogs.sas.com/content/iml/2016/04/18/patterns-of-missing-data-in-sas.html
This is a simple approach.
Hello @brownster,
You can also use PROC CORR to compute a correlation matrix from indicator variables defined as 1 if the original variable is missing, else 0, because the Pearson correlation coefficient of such variables equals the phi coefficient (a measure of association that you would normally compute with PROC FREQ).
Example using SASHELP.HEART:
/* Create table with information about missing levels of variables */
ods output nlevels=nlev;
proc freq data=sashelp.heart nlevels;
tables _all_ / noprint;
run;
/* Create indicator variables (1 if missing, else 0) for each variable with one or more missings */
proc sql noprint;
select cat('missing(',trim(tablevar),') as ',tablevar), tablevar
into :indvars separated by ', ', :indvarlist separated by ' '
from nlev
where nmisslevels;
create table want as
select &indvars
from sashelp.heart;
quit;
/* Compute phi coefficients (=Pearson correl. coeff.) for all pairs of indicator variables */
proc corr data=want;
var &indvarlist;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.