DATA Step, Macro, Functions and more

Report on missing values of each coloumn

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

Report on missing values of each coloumn

Hi all,

I have a data set more than 100 fields ,where I have to check the missing values of each field  an the percentage of missing .

So i have to write a code where multiple data sets can be executed and should give  the reports of each field of all data set in one final output .

your help is much appreciated

Thanks


Accepted Solutions
Solution
‎10-07-2015 03:37 PM
Valued Guide
Posts: 765

Re: Report on missing values of each coloumn

[ Edited ]

Hi.  Take a look at (fixed the indenting in the data so you can actually see what's going on) ...

 

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step

http://www.lexjansen.com/nesug/nesug11/ds/ds12.pdf

 

It explains why this SAS code works.  All you have to do for any data set is change the name of the data set used in PROC FREQ.  No other changes are required.  The code produced the attached report when I used the SASHELP.HEART.  All you have to do is cut/paste/change data set name/run.

 

proc format;
value nm . = '0' other = '1';
value $ch ' ' = '0' other = '1';
run;

 

ods listing close;
ods output onewayfreqs=tables;

proc freq data=sashelp.heart;
   tables _all_ / missing;
   format _numeric_ nm.
   _character_ $ch.;
run;

ods output close;
ods listing;

 

data report;
length var $32;
do until (last.table);
  set tables;
  by table notsorted;
  array names(*) f_: ;
  select (names(_n_));
    when ('0') do;
    miss = frequency;
    p_miss = percent;
  end;
  when ('1') do;
    ok = frequency;
    p_ok = percent;
  end;
  end;
end;
miss = coalesce(miss,0);
ok = coalesce(ok,0);
p_miss = coalesce(p_miss,0);
p_ok = coalesce(p_ok,0);
var = scan(table,-1);
keep var miss ok p_: ;
format miss ok comma7. p_: 5.1;
label
miss = 'N_MISSING'
ok = 'N_OK'
p_miss = '%_MISSING'
p_ok = '%_OK'
var = 'VARIABLE'
;
run;

 


report.png

View solution in original post


All Replies
Trusted Advisor
Posts: 1,610

Re: Report on missing values of each coloumn

For one dataset, you would use PROC MEANS with the NMISS option in the OUTPUT statement.

 

For multiple datasets, you could use a macro that loops through all of them, each using PROC MEANS.

Super User
Posts: 5,079

Re: Report on missing values of each coloumn

Here's an approach:

 

proc format;

value present low-high='Present' other='Missing';

value $present ' '='Missing' other='Present';

run;

 

proc freq data=mydata;

   tables _character_ _numeric_ / missing;

   format _character_ $present.;

   format _numeric_ present.;

run;

 

You will get an error message if you encounter a data set that contains all numeric variables (or all character variables).  A mix is required.

 

Good luck.

Solution
‎10-07-2015 03:37 PM
Valued Guide
Posts: 765

Re: Report on missing values of each coloumn

[ Edited ]

Hi.  Take a look at (fixed the indenting in the data so you can actually see what's going on) ...

 

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step

http://www.lexjansen.com/nesug/nesug11/ds/ds12.pdf

 

It explains why this SAS code works.  All you have to do for any data set is change the name of the data set used in PROC FREQ.  No other changes are required.  The code produced the attached report when I used the SASHELP.HEART.  All you have to do is cut/paste/change data set name/run.

 

proc format;
value nm . = '0' other = '1';
value $ch ' ' = '0' other = '1';
run;

 

ods listing close;
ods output onewayfreqs=tables;

proc freq data=sashelp.heart;
   tables _all_ / missing;
   format _numeric_ nm.
   _character_ $ch.;
run;

ods output close;
ods listing;

 

data report;
length var $32;
do until (last.table);
  set tables;
  by table notsorted;
  array names(*) f_: ;
  select (names(_n_));
    when ('0') do;
    miss = frequency;
    p_miss = percent;
  end;
  when ('1') do;
    ok = frequency;
    p_ok = percent;
  end;
  end;
end;
miss = coalesce(miss,0);
ok = coalesce(ok,0);
p_miss = coalesce(p_miss,0);
p_ok = coalesce(p_ok,0);
var = scan(table,-1);
keep var miss ok p_: ;
format miss ok comma7. p_: 5.1;
label
miss = 'N_MISSING'
ok = 'N_OK'
p_miss = '%_MISSING'
p_ok = '%_OK'
var = 'VARIABLE'
;
run;

 


report.png
SAS Employee
Posts: 24

Re: Report on missing values of each coloumn

Here is code to get you started. It will create a data set called missing_vars that has the percent and variable names of the data set test. You will need to macro-tize this so it will run through all the data sets you need to include in your report.

 

data test;
input x1 x2 x3 x4 y $;
cards;
1 . 1 1 a
13 2 . 5 b
11 . 1 . c
2 3 4 5 d
3 4 5 6 e
. 8 9 . f
5 6 7 8 g
8 9 8 9 h
3 5 8 . i
6 8 7 9 j
;
options mprint symbolgen mlogic;
data _null_;
if 0 then set test;
array num(*) _numeric_;
array chars(*) _character_;
call symput('nums',trim(left(put(dim(num),8.))));
call symput('chars',trim(left(put(dim(chars),8.))));
stop;
run;
data missing_vars;
set test end=end nobs=obsnum;
array num(*) _numeric_;
array chars(*) $ _character_;
array misscountn (&nums.) (&nums.*0);
array misscountc (&chars.) (&chars.*0);

do i=1 to dim(num);
if num(i) = . then misscountn(i) + 1;
end;
do i=1 to dim(chars);
if chars(i) = '' then misscountc(i) + 1;
end;

if end=1 then do;
do i= 1 to dim(num);
percentmiss= misscountn(i) / obsnum ;
var=(vname(num(i)));
output;
end;
do i= 1 to dim(chars);
percentmiss= misscountc(i) / obsnum ;
var=(vname(chars(i)));
output;
end;
end;
keep percentmiss var;
run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 259 views
  • 0 likes
  • 5 in conversation