Hi all,
I have a data set more than 100 fields ,where I have to check the missing values of each field an the percentage of missing .
So i have to write a code where multiple data sets can be executed and should give the reports of each field of all data set in one final output .
your help is much appreciated
Thanks
Hi. Take a look at (fixed the indenting in the data so you can actually see what's going on) ...
An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step
http://www.lexjansen.com/nesug/nesug11/ds/ds12.pdf
It explains why this SAS code works. All you have to do for any data set is change the name of the data set used in PROC FREQ. No other changes are required. The code produced the attached report when I used the SASHELP.HEART. All you have to do is cut/paste/change data set name/run.
proc format;
value nm . = '0' other = '1';
value $ch ' ' = '0' other = '1';
run;
ods listing close;
ods output onewayfreqs=tables;
proc freq data=sashelp.heart;
tables _all_ / missing;
format _numeric_ nm.
_character_ $ch.;
run;
ods output close;
ods listing;
data report;
length var $32;
do until (last.table);
set tables;
by table notsorted;
array names(*) f_: ;
select (names(_n_));
when ('0') do;
miss = frequency;
p_miss = percent;
end;
when ('1') do;
ok = frequency;
p_ok = percent;
end;
end;
end;
miss = coalesce(miss,0);
ok = coalesce(ok,0);
p_miss = coalesce(p_miss,0);
p_ok = coalesce(p_ok,0);
var = scan(table,-1);
keep var miss ok p_: ;
format miss ok comma7. p_: 5.1;
label
miss = 'N_MISSING'
ok = 'N_OK'
p_miss = '%_MISSING'
p_ok = '%_OK'
var = 'VARIABLE'
;
run;
For one dataset, you would use PROC MEANS with the NMISS option in the OUTPUT statement.
For multiple datasets, you could use a macro that loops through all of them, each using PROC MEANS.
Here's an approach:
proc format;
value present low-high='Present' other='Missing';
value $present ' '='Missing' other='Present';
run;
proc freq data=mydata;
tables _character_ _numeric_ / missing;
format _character_ $present.;
format _numeric_ present.;
run;
You will get an error message if you encounter a data set that contains all numeric variables (or all character variables). A mix is required.
Good luck.
Hi. Take a look at (fixed the indenting in the data so you can actually see what's going on) ...
An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step
http://www.lexjansen.com/nesug/nesug11/ds/ds12.pdf
It explains why this SAS code works. All you have to do for any data set is change the name of the data set used in PROC FREQ. No other changes are required. The code produced the attached report when I used the SASHELP.HEART. All you have to do is cut/paste/change data set name/run.
proc format;
value nm . = '0' other = '1';
value $ch ' ' = '0' other = '1';
run;
ods listing close;
ods output onewayfreqs=tables;
proc freq data=sashelp.heart;
tables _all_ / missing;
format _numeric_ nm.
_character_ $ch.;
run;
ods output close;
ods listing;
data report;
length var $32;
do until (last.table);
set tables;
by table notsorted;
array names(*) f_: ;
select (names(_n_));
when ('0') do;
miss = frequency;
p_miss = percent;
end;
when ('1') do;
ok = frequency;
p_ok = percent;
end;
end;
end;
miss = coalesce(miss,0);
ok = coalesce(ok,0);
p_miss = coalesce(p_miss,0);
p_ok = coalesce(p_ok,0);
var = scan(table,-1);
keep var miss ok p_: ;
format miss ok comma7. p_: 5.1;
label
miss = 'N_MISSING'
ok = 'N_OK'
p_miss = '%_MISSING'
p_ok = '%_OK'
var = 'VARIABLE'
;
run;
Here is code to get you started. It will create a data set called missing_vars that has the percent and variable names of the data set test. You will need to macro-tize this so it will run through all the data sets you need to include in your report.
data test;
input x1 x2 x3 x4 y $;
cards;
1 . 1 1 a
13 2 . 5 b
11 . 1 . c
2 3 4 5 d
3 4 5 6 e
. 8 9 . f
5 6 7 8 g
8 9 8 9 h
3 5 8 . i
6 8 7 9 j
;
options mprint symbolgen mlogic;
data _null_;
if 0 then set test;
array num(*) _numeric_;
array chars(*) _character_;
call symput('nums',trim(left(put(dim(num),8.))));
call symput('chars',trim(left(put(dim(chars),8.))));
stop;
run;
data missing_vars;
set test end=end nobs=obsnum;
array num(*) _numeric_;
array chars(*) $ _character_;
array misscountn (&nums.) (&nums.*0);
array misscountc (&chars.) (&chars.*0);
do i=1 to dim(num);
if num(i) = . then misscountn(i) + 1;
end;
do i=1 to dim(chars);
if chars(i) = '' then misscountc(i) + 1;
end;
if end=1 then do;
do i= 1 to dim(num);
percentmiss= misscountn(i) / obsnum ;
var=(vname(num(i)));
output;
end;
do i= 1 to dim(chars);
percentmiss= misscountc(i) / obsnum ;
var=(vname(chars(i)));
output;
end;
end;
keep percentmiss var;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.