I have a dataset with 500 variables. I want to run proc freq and proc means for the character and numeric variables. SAS is great in handling _character_ and _numeric_ variables; however, the character variables in my dataset are coded as numeric because each character variable is a couple of levels that are saved with numbers (e.g., 1,2,3,4 then I have format to assign the character description in the output). Usually, I manually separate the character variables. I wonder if you have a suggestion to write a code to help me with distinguishing the character variables from the numeric variables. Thanks
You could just define those variables as character when you create them. So instead of defining those variables with values of 1 and 2 as numeric define then as character with values of '1' and '2' and create character formats instead of numeric formats.
What form does your original data come in? If it is text file (like a CSV file) then just adjust the code of your data step that is reading it to define them as character.
Sounds like you want to distinguish the categorical variables from the continuous (or at least multi level) variables whether or not the variables are defined as character. The NLEVEL output of PROC FREQ is very useful for this.
ods output nlevels=nlevels;
proc freq nlevels data=have ;
tables _all_ / noprint;
run;
You could then set some upper limit on the number of levels below which you want to consider the variable as categorical.
proc sql noprint;
select nliteral(TableVar) into :varlist separated by ' '
from nlevels
where nlevels < 5
;
quit;
You can then use that list of variables in your other code. For example you might use that list as which variable to generate PROC FREQ tables for. Or perhaps to exclude them from PROC MEANS analysis of the numeric variables.
proc freq data=have ;
tables &varlist;
run;
proc means data=have(drop=&varlist);
run;
You could just define those variables as character when you create them. So instead of defining those variables with values of 1 and 2 as numeric define then as character with values of '1' and '2' and create character formats instead of numeric formats.
What form does your original data come in? If it is text file (like a CSV file) then just adjust the code of your data step that is reading it to define them as character.
Sounds like you want to distinguish the categorical variables from the continuous (or at least multi level) variables whether or not the variables are defined as character. The NLEVEL output of PROC FREQ is very useful for this.
ods output nlevels=nlevels;
proc freq nlevels data=have ;
tables _all_ / noprint;
run;
You could then set some upper limit on the number of levels below which you want to consider the variable as categorical.
proc sql noprint;
select nliteral(TableVar) into :varlist separated by ' '
from nlevels
where nlevels < 5
;
quit;
You can then use that list of variables in your other code. For example you might use that list as which variable to generate PROC FREQ tables for. Or perhaps to exclude them from PROC MEANS analysis of the numeric variables.
proc freq data=have ;
tables &varlist;
run;
proc means data=have(drop=&varlist);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.