I am trying to create a macro that calculates number of levels (distinct categories) for all the character variables in a dataset. Currently, i am using PROC SQL to calculate distinct categories. If i have 100 variables, i have to run PROC SQL 100 times in a loop. Is there a better way to do it? Is "stackods" kind of option available in PROC SQL so that i can reshape my dataset easily?
Sample Code -
proc sql;
create table abc as
select count(distinct(sex)) as sex_levels
from sashelp.class;
quit;
Sample Output -
Variable | N_Levels |
Sex | 2 |
Age | 6 |
It's just the sample code and output for 2 variables. I need to do it for all the character variables in a dataset.
Note : All my categorical variables are stored as character variables (string) in my dataset.
Thanks in anticipation!
PROC FREQ NLEVELS ....
I am with Tom: proc.freq.nlevels provides the answer, ... for the variables in the data-set.
I have written several papers on extrapolating from n-levels.
the information you are seeking is in contents output data set which has a variable type -- in (1:n, 2:c)
you have to sort both out-nlevels and out-contents by name in order to merge them.
check this SASware ballot item.
%let data = sashelp.class;
%let data = sashelp.heart;
proc freq data = &data
(keep = _character_)
nlevels;
ods exclude onewayfreqs;
ods output
nlevels = out_n_levels
(keep = tablevar nlevels
rename = (tablevar = name
nlevels = n_levels));
*to match out_contents;
proc print;
How would you tackle a data set with more than 70k levels for at least one variable? (proc freq always yields OoM error.)
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.