Is there a simple way to create a list of variables that contain nulls? I found a way to do it if I manually refer to the column by name.
E.g.
select count(*)
from my table
where employee_category is null
group by employee_category
having count(*) > 0) as employee_category_nulls
Returns number of nulls in column.
Even better is to only return a list of columns that contains nulls and the number of nulls.
ods select none;
ods output nlevels=want;
proc freq data=sashelp.heart nlevels;
table _all_;
run;
ods select all;
proc print data=want(where=(NMissLevels ne 0));run;
I believe you have to check each variable on all observations.
That can be done in one step using two arrays, as in next code (not tested):
data _NULL_;
set have end=eof;
array vx {*} _numeric_; /* if need create separate array for _charcter_ */
array nx $ n1-n100; /* up to 100 variables. adapt to need */
length list $1000; /* adapt length to contain full list of var names */
/* fulfill variable names */
do i=1 to dim(vx);
nx(i) = vname(vx(i));
end;
/* check each numeric var in an observation */
do i=1 to dim(vx);
if vx(i) = . and findw(list,nx(i)) = 0
then catx(' ',trim(list), nx(i));
end;
if eof then put list=;
run;
Currently debugging this to see if it works.
If there was a way to do the below and only include instances where the column is character and has missing:
PROC FORMAT;
value $missfmt ' '='Missing' other='Not Missing';
value missfmt . ='Missing' other='Not Missing';
RUN;
PROC FREQ DATA=CCW.INTGRD_INPA;
FORMAT _CHAR_ $missfmt.;
TABLES _CHAR_ / missing missprint nocum nopercent;
FORMAT _NUMERIC_ missfmt.;
TABLES _NUMERIC_ / missing missprint nocum nopercent;
;
RUN;
Here is the code adapted to check CHAR variables: (the formats not needed)
data _NULL_;
set have end=eof;
array vx $ _charcter_;
array nx $ n1-n100; /* up to 100 variables. adapt to need */
length list $1000; /* adapt length to contain full list of var names */
/* fulfill variable names */
do i=1 to dim(vx);
nx(i) = vname(vx(i));
end;
/* check each numeric var in an observation */
do i=1 to dim(vx);
if missing(vx(i)) and findw(list,nx(i)) = 0
then catx(' ',trim(list), nx(i));
end;
if eof then put list=;
run;
When I run this:
then catx(' ',trim(list), nx(i));
_
22
76
ERROR: Undeclared array referenced: catx.
Sorry, line should be:
list = catx(' ',trim(list), nx(i));
The output is list=_charcte
Change array definitions (tested):
array vx {*} $ _character_;
array nx {*} $ n1-n100;
ods select none;
ods output nlevels=want;
proc freq data=sashelp.heart nlevels;
table _all_;
run;
ods select all;
proc print data=want(where=(NMissLevels ne 0));run;
Ksharp, thank you, your code is extremely helpful for profiling. I found that adding table _CHARACTER_ is useful as there is not as much purpose to mass checking for null numbers.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.