BookmarkSubscribeRSS Feed
GertNissen
Barite | Level 11

Hi All

 

Welcome back to another season of SAS Juletip 😉

This year we will have Juletips every working day in December, and even some days with two Juletip.

 

All Juletip will be tagged, and you can find all the Juletip from previous years in the community:

2014

2015

2016

2017

2018

38

17

17

16

(new every day)

Please comment, like and share the SAS juletip

 

----------------------------------------------------------------------------

Juletip #1 - Finding Missing data

 

Sometimes when you work with data you need to get a quick&dirty overview of the data quality - i.e. in regards to missing values.

 

Today's tip introduces 4 simple ways to get a fast overview and with other small tips introduced along the way.

1) Quality overview of your numeric variables using PROC MEANS;

PROC MEANS data=Sashelp.Heart nolabels NMISS N stackodsoutput;
  var _NUMERIC_; * Being lazy not specifing the individual variables *;
  ods output Summary = MissingValues;
run;

Notice the output table being made with ods and not "output out=" syntax. The option stackodsoutput controls how the output looks

 

You could then create a list of problem variables with missing values for later use - if needed 

proc sql noprint;
select Variable into :MissingVarList separated by ' '
from MissingValues
where NMiss > 0;
quit;
%put &=MissingVarList;

  

2) Using Data step
The missing() function has been around for many SAS version, but not many are using its siblings nmiss & cmiss
nmiss() count numeric missing
cmiss() count not Char Missing, but Count Missing of both numeric and char variables - making is more useful

When adding all available variables to arrays, we can refer to them automatically without specifying each of them (lazy approach)

 

data _NULL_;
  set sashelp.heart;
  array char[*} $ _CHAR_;
  array num[*} _NUMERIC_;
  if cmiss(of char[*], of num[*]) then put 'Missing:' _n_ char[*]=;
run;

Notice the summarization of all variables in the array, and the put of all variables in the log. You could of course output the same to a dataset. Looping etc.

  

3) Outputting the variable name with missing values

This example loop through the variables and only output the variable names (vname) that contain missing values for each obs 

data Missing(keep=obsno name);
  set Sashelp.Heart nobs=nobs;
  array char[*} $ _CHAR_;
  array num[*} _NUMERIC_;
  obsno=_n_;
  length name $32;             
   do i=1 to dim(char);
      if cmiss(char[i]) then 
        do;
          name = vname(char{i});
          output;
        end; 
   end;
run;

  

4) Using proc freq with formats
An alternative solution using formats and proc freq

 

First setup a dummy format to distinguish between missings and not missing.

proc format;
    value missing    .='Missing' other = 'Has Value';
    value $missing ' '='Missing' other = 'Has Value';
run;

Here we aggregate all values to two groups based on the format

ods output OneWayFreqs=freqs;
*ods trace on; * Use this to identify output tabels in ODS*; 
 
proc freq data=sashelp.heart;
    tables _all_/missing;
    format _character_ $missing. _numeric_ missing.; 
run;
*ods trace off;
ods output close;

This tip also outputs the result using ODS output in a different way than in example 1.

 

Question: These small snippets are just examples, How would you get this overview of missing values in your data? add a comment below.

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

I like the MI Procedure for a nice overview of missing value patterns in numeric variables with the displaypattern=nomeans option (requires SAS 9.4m5)

 

proc mi data=Sashelp.Heart nimpute=0 displaypattern=nomeans;
run;

Looking forward to a lot of SAS juetips 🙂

Haris
Lapis Lazuli | Level 10

Here's the marriage of the two posts above that displays a pattern of missingness similar to PROC MI for both numeric and character vars:

 

proc format;
    value missing    .='M' other = ' ';
    value $missing ' '='M' other = ' ';
run;
proc sql noprint;
	select Name into :Vars separated by '*'
	from dictionary.columns
	where libname='SASHELP' AND memname='HEART';
quit;
proc freq data=sashelp.heart;
    tables &Vars / list missing;
    format _character_ $missing. _numeric_ missing.; 
run;
Haris
Lapis Lazuli | Level 10

Sorry to pollute the interwebs but I liked this so much that I wrote a macro for myself. 

1. It counts the number of missing vales for all variables in a dataset--both numeric and character

2. Prints a pattern of missingness for all variables that have at least one missing value starting on the left with the most-missing variable and displaying variables to the right with decreasing amount of missingness.

 

Maybe some of you will find it useful:

%macro Missingness(LibName,MemName);
proc format;
    value missing    .='M' other = ' ';
    value $missing ' '='M' other = ' ';
run;

proc sql noprint;
	select cat('sum(cmiss(',strip(Name),'))',' as ',strip(Name)) into :CVars separated by ','
	from dictionary.columns
	where libname=upcase("&LibName") AND memname=upcase("&MemName")
		AND Type='char'
;
	select cat('nmiss(',strip(Name),')',' as ',strip(Name)) into :NVars separated by ','
	from dictionary.columns
	where libname=upcase("&LibName") AND memname=upcase("&MemName")
		AND Type='num'
;
	create table missingness as
	select &CVars, &NVars
	from SASHelp.Heart;
quit;

proc transpose data=missingness
				out=MissingT(rename=(_Name_=Variable Col1=NMiss));
	format NMiss comma.;
run;

proc sql;
	select * from MissingT order by NMiss desc
;
	reset noprint
;
	select Variable into :Vars separated by '*'
	from MissingT
	where NMiss GT 0
	order by NMiss desc
;
    drop table Missingness
;
quit; proc freq data=&LibName..&MemName; tables &Vars / list missing; format _character_ $missing. _numeric_ missing.; run; %mend Missingness;