I am trying to create a macro that calculates number of levels (distinct categories) for all the character variables in a dataset. Currently, i am using PROC SQL to calculate distinct categories. If i have 100 variables, i have to run PROC SQL 100 times in a loop. Is there a better way to do it? Is "stackods" kind of option available in PROC SQL so that i can reshape my dataset easily?
Sample Code -
proc sql;
create table abc as
select count(distinct(sex)) as sex_levels
from sashelp.class;
quit;
Sample Output -
Variable | N_Levels |
Sex | 2 |
Age | 6 |
It's just the sample code and output for 2 variables. I need to do it for all the character variables in a dataset.
Note : All my categorical variables are stored as character variables (string) in my dataset.
Thanks in anticipation!
PROC FREQ NLEVELS ....
I am with Tom: proc.freq.nlevels provides the answer, ... for the variables in the data-set.
I have written several papers on extrapolating from n-levels.
the information you are seeking is in contents output data set which has a variable type -- in (1:n, 2:c)
you have to sort both out-nlevels and out-contents by name in order to merge them.
check this SASware ballot item.
%let data = sashelp.class;
%let data = sashelp.heart;
proc freq data = &data
(keep = _character_)
nlevels;
ods exclude onewayfreqs;
ods output
nlevels = out_n_levels
(keep = tablevar nlevels
rename = (tablevar = name
nlevels = n_levels));
*to match out_contents;
proc print;
How would you tackle a data set with more than 70k levels for at least one variable? (proc freq always yields OoM error.)
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.