Hi All,
The program:
proc tabulate data=sashelp.class f=comma6.0;
title ;
var _numeric_;
table _numeric_,
nmiss n/ box='Variable';
run;
can be used to count non-missing values for numerical variable, I also want count for character variable, could you help me modify this code to achive my goal?
Thanks a lot!
Best wishes,
Do you want to take another way to get what you want? This way may help you to get more information of the observations.
Be brave to do some programming with arrays.
data want;
set sashelp.class end = eof;
N + 1;
array num[*] _numeric_;
array char[*] _character_;
array n_count[100] _temporary_ (100*0); * Assuming data set has no more than 100 OBS;
array c_count[100] _temporary_ (100*0);
do i = 1 to dim(num);
if missing(num[i]) then n_count[i] + 1;
end;
do i = 1 to dim(char);
if missing(char[i]) then c_count[i] + 1;
end;
if eof then do;
do i = 1 to dim(num);
Variable = vname(num[i]);
NMiss = n_count[i];
output;
end;
do i = 1 to dim(char);
Variable = vname(char[i]);
NMiss = c_count[i];
output;
end;
end;
keep Variable N NMiss;
run;
Character variables in Proc Tabulate may only be CLASS variables. Because of the default that any record with a missing value for a class variable is excluded you would need to add the missing option. The only statistics you may request for character variables in Proc Tabulate are N and the various N-related such a Pctn, ColPctN, RowPctn, PagePctn and RepPctn.
The following code adds a separate table to your output that has each value of the character var and its count.
proc tabulate data=sashelp.class f=comma6.0;
title ;
var _numeric_;
table _numeric_,
nmiss n/ box='Variable';
class _character_ / missing;
table _character_,n;
run;
If you want a simplier count you might want to go through your data and add a numeric flag for when the character variable isn't missing and count (or sum) that. Though there isn't going be a short easy way to reference the character variable counted.
Do you want to take another way to get what you want? This way may help you to get more information of the observations.
Be brave to do some programming with arrays.
data want;
set sashelp.class end = eof;
N + 1;
array num[*] _numeric_;
array char[*] _character_;
array n_count[100] _temporary_ (100*0); * Assuming data set has no more than 100 OBS;
array c_count[100] _temporary_ (100*0);
do i = 1 to dim(num);
if missing(num[i]) then n_count[i] + 1;
end;
do i = 1 to dim(char);
if missing(char[i]) then c_count[i] + 1;
end;
if eof then do;
do i = 1 to dim(num);
Variable = vname(num[i]);
NMiss = n_count[i];
output;
end;
do i = 1 to dim(char);
Variable = vname(char[i]);
NMiss = c_count[i];
output;
end;
end;
keep Variable N NMiss;
run;
You can use PROC SQL to get the count of missing values for character or numeric variables. A coworker asked how to find all the variables in a data set that are missing for all observations. The code example below can answer that question.
options mprint;
/* Using SASHELP data sets while testing macro.
Macro should list variables that are
completely missing in data set, so need to
create some data sets with such variables. */
data baseball;
set sashelp.baseball;
suzanne_favorite_team='';
cleveland_indians_rank=.;
run;
data class;
set sashelp.class;
teacher='';
grade_level=.;
run;
%macro missing_summary(lib=, dsn=);
/* use dummy macro so that syntax highlighting shows up within missing macro */
%macro dummy();
%mend dummy;
/* get data set variable names */
proc sql;
create table vars_to_summarize as
select name,
label
from sashelp.vcolumn
where libname="%upcase(&lib)" and
memname="%upcase(&dsn)";
quit;
/* count number of variables to process */
proc sql noprint;
select count(*) into
:var_counter
from vars_to_summarize;
quit;
/* do PROC SQL one variable at a time,
append details for variable to summary
report, then delete the variable's data
set so that work library doesn't get
cluttered. */
%do i=1 %to &var_counter;
data _null_;
obsnum=&i;
set vars_to_summarize point=obsnum;
if _error_ then abort;
call symputx('field',name);
call symputx('description',label);
stop;
run;
proc sql;
create table &field._report as
select count(*) as n,
count(&field) as not_missing,
nmiss(&field) as missing
from &lib..&dsn;
quit;
data &field._report;
length variable $ 32 dataset $ 41 label $ 256;
set &field._report;
variable="&field";
dataset="&lib..&dsn";
label="&description";
run;
proc append base=summary data=&field._report;
run;
proc datasets nolist;
delete &field._report;
quit;
%end;
proc datasets nolist;
delete vars_to_summarize;
quit;
%mend missing_summary;
%missing_summary(lib=work, dsn=class);
%missing_summary(lib=work, dsn=baseball);
proc print data=summary noobs;
by dataset notsorted;
* uncomment statement below if you only want to see the variables missing for ALL obs ;
* where n=missing;
var dataset variable label n not_missing missing;
title 'Missing and not-missing counts by variable';
run;
* clean up;
proc datasets nolist;
delete summary;
quit;
References:
Top 10 Most Powerful Functions for PROC SQL
Dictionary Tables and Views: Essential Tools for Serious Applications
@Xiaoningdemao wrote:
This is perfect! I got exactly what I want!! Just to double check with you, I just need to change all the "100" in this code to the number of obs in my data set, right? @KachiM
Ideally the 100 would be replaced with the number of numeric variables for the numeric array and the number of character variables for the character array.
Here is an example code snippet to get the counts of char and num variables.:
proc sql;
select type, count(*) as n
from dictionary.columns
where libname='SASHELP' and memname='CLASS'
group by type;
quit;
For character variables add a variable to count or sum depending on how you are summarizing.
data want;
set have;
GenderNotMiss= (not missing(gender)); /* creates a 1 for not missing, 0 for missing values of gender*/
run;
Then in proc tabulate request SUM of GenderNotMiss;
or: if not missing(gender0 then GenderNotMiss=1;
and you could use the NMISS in tabulate along with the other numerics.
I would suggest using a good label with these indicator variables.
'100' in my code represents total number of VARIABLES in the data set and NOT the number of observations as I wrote on the comments. The array will work even with 10 Million Variables. It just provides memory space to hold that many variables but in reality it will much less and the counting is done for as many NUMERIC and CHARACTER Variables.
If you know the numbers of numeric variables (say 15) and character variables (say 20) before running the data step, you can size the arrays as:
array n_count[15] _temporary_ (15*0); * Assuming data set has no more than 100 OBS;
array c_count[20] _temporary_ (20*0);
Hope this helps you.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.