I am trying to create a macro that calculates number of levels (distinct categories) for all the character variables in a dataset. Currently, i am using PROC SQL to calculate distinct categories. If i have 100 variables, i have to run PROC SQL 100 times in a loop. Is there a better way to do it? Is "stackods" kind of option available in PROC SQL so that i can reshape my dataset easily?
Sample Code -
proc sql;
create table abc as
select count(distinct(sex)) as sex_levels
from sashelp.class;
quit;
Sample Output -
Variable | N_Levels |
Sex | 2 |
Age | 6 |
It's just the sample code and output for 2 variables. I need to do it for all the character variables in a dataset.
Note : All my categorical variables are stored as character variables (string) in my dataset.
Thanks in anticipation!
PROC FREQ NLEVELS ....
I am with Tom: proc.freq.nlevels provides the answer, ... for the variables in the data-set.
I have written several papers on extrapolating from n-levels.
the information you are seeking is in contents output data set which has a variable type -- in (1:n, 2:c)
you have to sort both out-nlevels and out-contents by name in order to merge them.
check this SASware ballot item.
%let data = sashelp.class;
%let data = sashelp.heart;
proc freq data = &data
(keep = _character_)
nlevels;
ods exclude onewayfreqs;
ods output
nlevels = out_n_levels
(keep = tablevar nlevels
rename = (tablevar = name
nlevels = n_levels));
*to match out_contents;
proc print;
How would you tackle a data set with more than 70k levels for at least one variable? (proc freq always yields OoM error.)
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.