Hi All
Welcome back to another season of SAS Juletip 😉
This year we will have Juletips every working day in December, and even some days with two Juletip.
All Juletip will be tagged, and you can find all the Juletip from previous years in the community:
2014 |
2015 |
2016 |
2017 |
2018 |
Please comment, like and share the SAS juletip
----------------------------------------------------------------------------
Juletip #1 - Finding Missing data
Sometimes when you work with data you need to get a quick&dirty overview of the data quality - i.e. in regards to missing values.
Today's tip introduces 4 simple ways to get a fast overview and with other small tips introduced along the way.
1) Quality overview of your numeric variables using PROC MEANS;
PROC MEANS data=Sashelp.Heart nolabels NMISS N stackodsoutput;
var _NUMERIC_; * Being lazy not specifing the individual variables *;
ods output Summary = MissingValues;
run;
Notice the output table being made with ods and not "output out=" syntax. The option stackodsoutput controls how the output looks
You could then create a list of problem variables with missing values for later use - if needed
proc sql noprint;
select Variable into :MissingVarList separated by ' '
from MissingValues
where NMiss > 0;
quit;
%put &=MissingVarList;
2) Using Data step
The missing() function has been around for many SAS version, but not many are using its siblings nmiss & cmiss
nmiss() count numeric missing
cmiss() count not Char Missing, but Count Missing of both numeric and char variables - making is more useful
When adding all available variables to arrays, we can refer to them automatically without specifying each of them (lazy approach)
data _NULL_;
set sashelp.heart;
array char[*} $ _CHAR_;
array num[*} _NUMERIC_;
if cmiss(of char[*], of num[*]) then put 'Missing:' _n_ char[*]=;
run;
Notice the summarization of all variables in the array, and the put of all variables in the log. You could of course output the same to a dataset. Looping etc.
3) Outputting the variable name with missing values
This example loop through the variables and only output the variable names (vname) that contain missing values for each obs
data Missing(keep=obsno name);
set Sashelp.Heart nobs=nobs;
array char[*} $ _CHAR_;
array num[*} _NUMERIC_;
obsno=_n_;
length name $32;
do i=1 to dim(char);
if cmiss(char[i]) then
do;
name = vname(char{i});
output;
end;
end;
run;
4) Using proc freq with formats
An alternative solution using formats and proc freq
First setup a dummy format to distinguish between missings and not missing.
proc format;
value missing .='Missing' other = 'Has Value';
value $missing ' '='Missing' other = 'Has Value';
run;
Here we aggregate all values to two groups based on the format
ods output OneWayFreqs=freqs;
*ods trace on; * Use this to identify output tabels in ODS*;
proc freq data=sashelp.heart;
tables _all_/missing;
format _character_ $missing. _numeric_ missing.;
run;
*ods trace off;
ods output close;
This tip also outputs the result using ODS output in a different way than in example 1.
Question: These small snippets are just examples, How would you get this overview of missing values in your data? add a comment below.
I like the MI Procedure for a nice overview of missing value patterns in numeric variables with the displaypattern=nomeans option (requires SAS 9.4m5)
proc mi data=Sashelp.Heart nimpute=0 displaypattern=nomeans;
run;
Looking forward to a lot of SAS juetips 🙂
Here's the marriage of the two posts above that displays a pattern of missingness similar to PROC MI for both numeric and character vars:
proc format;
value missing .='M' other = ' ';
value $missing ' '='M' other = ' ';
run;
proc sql noprint;
select Name into :Vars separated by '*'
from dictionary.columns
where libname='SASHELP' AND memname='HEART';
quit;
proc freq data=sashelp.heart;
tables &Vars / list missing;
format _character_ $missing. _numeric_ missing.;
run;
Sorry to pollute the interwebs but I liked this so much that I wrote a macro for myself.
1. It counts the number of missing vales for all variables in a dataset--both numeric and character
2. Prints a pattern of missingness for all variables that have at least one missing value starting on the left with the most-missing variable and displaying variables to the right with decreasing amount of missingness.
Maybe some of you will find it useful:
%macro Missingness(LibName,MemName);
proc format;
value missing .='M' other = ' ';
value $missing ' '='M' other = ' ';
run;
proc sql noprint;
select cat('sum(cmiss(',strip(Name),'))',' as ',strip(Name)) into :CVars separated by ','
from dictionary.columns
where libname=upcase("&LibName") AND memname=upcase("&MemName")
AND Type='char'
;
select cat('nmiss(',strip(Name),')',' as ',strip(Name)) into :NVars separated by ','
from dictionary.columns
where libname=upcase("&LibName") AND memname=upcase("&MemName")
AND Type='num'
;
create table missingness as
select &CVars, &NVars
from SASHelp.Heart;
quit;
proc transpose data=missingness
out=MissingT(rename=(_Name_=Variable Col1=NMiss));
format NMiss comma.;
run;
proc sql;
select * from MissingT order by NMiss desc
;
reset noprint
;
select Variable into :Vars separated by '*'
from MissingT
where NMiss GT 0
order by NMiss desc
;
drop table Missingness
;
quit;
proc freq data=&LibName..&MemName;
tables &Vars / list missing;
format _character_ $missing. _numeric_ missing.;
run;
%mend Missingness;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.