HI! I have a table with 100+ variables and I would like to know the easiest/ fastest way to check for missing/nonmissing values without having to run frequency count for each varaible in separate tables. Preferably, I would like all variables in on column with missing and nonmissing values in columns. For example:
Variable Missing Non_Missing
RAWREG 2 8
PVPRISM 0 10
DAYS 1 9
OVRIDE3 10 0
Again, I have 100+ variables, so can I use some kind of Select (*) function w/o having to type out all variables?
Here is a sample data set:
Obs | RAWREG | PVPRISM | OVRIDE3 | DAYS |
---|---|---|---|---|
1 | 01 | 0939 | 408 | |
2 | 0943 | 379 | ||
3 | 09 | 0029 | 170 | |
4 | 11 | 0948 | ||
5 | 0948 | 59 | ||
6 | 04 | 6108 | 57 | |
7 | 06 | 0993 | 59 | |
8 | 06 | 6111 | 59 | |
9 | 06 | 0993 | 220 | |
10 | 04 | 0901 | 59 |
The NLEVELS of Proc FREQ does not exactly do what you need. Had another go using the Hash object in the DATA Step.
Have a look
data have;
infile cards dlm="," dsd;
input
RAWREG
PVPRISM $
OVRIDE3 $
DAYS
;
cards;
01,0939,,408
,0943,,379
09,0029,,170
11,0948,,
,0948,,59
04,6108,,57
06,0993,,59
06,6111,,59
06,0993,,220
04,0901,,59
,,3,
;
data _null_;
set have end=last;
array _xchar{*} _character_;
array _xnum{*} _numeric_;
if _n_ = 1 then do;
length
_colName $ 32
_nMiss 8
_n 8
;
declare hash hmiss( ordered: "Y");
hmiss.defineKey("_colName");
hmiss.defineData("_colName", "_n", "_nMiss");
hmiss.defineDone();
end;
do i = 1 to dim(_xchar);
_nMiss=0;
_n=0;
_colName = vname( _xchar{i} );
_rc = hmiss.find();
_nMiss + (missing(_xchar{i}) = 1);
_n + (missing(_xchar{i}) = 0);
if _rc = 0 then do;
hmiss.replace();
end;
else do;
hmiss.add();
end;
end;
do i = 1 to dim(_xnum);
_nMiss=0;
_n=0;
_colName = vname( _xnum{i} );
_rc = hmiss.find();
_nMiss + (missing(_xnum{i}) = 1);
_n + (missing(_xnum{i}) = 0);
if _rc = 0 then do;
hmiss.replace();
end;
else do;
hmiss.add();
end;
end;
if last = 1 then do;
hmiss.output(dataset: "hashmiss");
end;
run;
proc print data=hashmiss;
run;
proc print data=have;
run;
Bruno
UNTESTED CODE
proc summary data=have;
var rawreg--days;
output out=stats n= nmiss=/autoname;
run;
It seems I can't say this often enough: don't have tables with hundreds of variables. It just causes headache.
Change your data model (e.g. by transposing your data), then it's much easier to analyze it.
This means that you created your data set with all character variables, no numeric variables. You have two basic choices at this point. Either change your data, so that you actually have numeric variables. Or, write a more complex program that will count missings and nonmissings for character variables. As you make your choice, you might want to consider how you plan to use the data in the future ... will you actually need numeric variables in your data set.
At this point, first you make your choice. Then the programming can begin.
It means that your variables are character strings, not numbers as they appear. You should fix that first.
Wouldn't this be sufficient to get a quick overview?
proc format;
value miss
._-.z = 'Missing'
other = 'Non_Missing';
value $miss
' ' = 'Missing'
other = 'Non_Missing';
run;
proc freq data=have;
format _character_ $miss.
_numeric_ miss.;
tables _all_ / missing;
run;
See the ideas, examples, and links in the article "Count the number of missing values in for each variable"
Proc FREQ has the NLEVELS option that is helpfull for this, see code sample below.
data have;
infile cards dlm="," dsd;
input
RAWREG $
PVPRISM $
OVRIDE3 $
DAYS $
;
cards;
01,0939,,408
,0943,,379
09,0029,,170
11,0948,,
,0948,,59
04,6108,,57
06,0993,,59
06,6111,,59
06,0993,,220
04,0901,,59
;
proc freq data=have nlevels;
table _all_ / noprint;
run;
It will create this output:
Number of Variable Levels | |||
---|---|---|---|
Variable | Levels | Missing Levels | Nonmissing Levels |
RAWREG | 6 | 1 | 5 |
PVPRISM | 8 | 0 | 8 |
OVRIDE3 | 1 | 1 | 0 |
DAYS | 7 | 1 | 6 |
Please clarify: Are you interested in only NUMERIC variables, or do you want the tabulation for numeric and character variables?
Preferably bother character and numeric. But I know that code can get a little tricky. I can do it separate, counting character using:
proc format;
value $missfmt ' '= 'Missing' other='Not Missing';
run;
proc freq data= have;
format _CHAR_ $missfmt.;
tables _char_ / missing missprint nocum nopercent;
run;
***But is there a way to for the character sas variables to be alphabetical in the results viewer?
Here is a solution, the _all_ adds some noise but it'll work.:
data have;
infile cards dsd;
input Obs RAWREG$ PVPRISM$ OVRIDE3 DAYS;
cards;
1,01,0939,,408
2,,0943,,379
3,09,0029,,170
4,11,0948,,
5,,0948,,59
6,04,6108,,57
7,06,0993,,59
8,06,6111,,59
9,06,0993,,220
10,04,0901,,59
;
proc sql;
create table get_max as
select *,max(obs) as max_obs
from have;
proc transpose data=get_max out=tran;by max_obs;var _all_;
data want (keep=Variable missing non_missing);
set tran(rename=(_name_ = Variable));
Missing=nmiss(of col: );
Non_Missing=max_obs - missing;
if variable in ('Obs','max_obs') then delete;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.