Hello everyone,
I’m trying to create the following desired output. My input data set has categorical variables, I need to get “0” values count and their percentage based on each variable. To create my desired output, I wrote the following code but as you can see for the Categorical3 variable, if the variable doesn’t include “0” values so it doesn’t bring any value or percentage in PROC FREQ statement.
Another problem is in my output that Frequency and Percent variable does not match with each other and double Categorical1 and Categorical2 rows.
My desired output and my code is as below, is anybody can help to reach my desired output, please?
My Code;
DATA Have;
Length Categorical1 $ 20 Categorical2 $ 20 Categorical3 $ 20;
Infile Datalines Missover ;
Input Categorical1 Categorical2 Categorical3;
Datalines;
0 0 1
1 2 3
2 3 1
4 3 2
3 4 1
0 0 4
4 3 3
2 0 4
;
Run;
Ods Trace On;
PROC FREQ Data=Have;
Tables Categorical1 Categorical2 Categorical3 / NoCum Missing;
Table Categorical1 Categorical2 Categorical3;
Ods Output OneWayFreqs=Want;
Run;
Ods Trace Off;
Proc Sql;
Create Table Want As
Select Substr(Table,7) As Variable,
LEFT(Trim(F_Categorical1)||""||F_Categorical2||""||F_Categorical3) As Level,
Frequency, Percent
From Want
Where Calculated Level="0";;
Quit;
My Code Output;
Desired Output;
Thank you,
There are two ways to accomplish this, one is to create the list of values ahead of time via a format and use PRELOADFMT in proc tabulate since proc freq does not support PRELOADFMT.
Here's a paper on how to accomplish this, see example #3. If you search PRELOADFMT on here, you'll find many examples as well.
http://www.lexjansen.com/nesug/nesug11/cc/cc29.pdf
Another is to use the SPARSE option in proc freq after you've transposed your data.
DATA Have;
Length Categorical1 $ 20 Categorical2 $ 20 Categorical3 $ 20;
Infile Datalines Missover ;
Input Categorical1 Categorical2 Categorical3;
Datalines;
0 0 1
1 2 3
2 3 1
4 3 2
3 4 1
0 0 4
4 3 3
2 0 4
;
Run;
data flipped;
set have;
array vars(3) $ Categorical1 Categorical2 Categorical3;
do i=1 to dim(vars);
Variable = vname(vars(i));
Level = vars(i);
Output;
end;
keep variable level;
run;
proc freq data=flipped noprint;
table variable*level/sparse out=want(where=(level='0') drop=percent pct_row rename=(pct_col=percent)) outpct;
run;
There are two ways to accomplish this, one is to create the list of values ahead of time via a format and use PRELOADFMT in proc tabulate since proc freq does not support PRELOADFMT.
Here's a paper on how to accomplish this, see example #3. If you search PRELOADFMT on here, you'll find many examples as well.
http://www.lexjansen.com/nesug/nesug11/cc/cc29.pdf
Another is to use the SPARSE option in proc freq after you've transposed your data.
DATA Have;
Length Categorical1 $ 20 Categorical2 $ 20 Categorical3 $ 20;
Infile Datalines Missover ;
Input Categorical1 Categorical2 Categorical3;
Datalines;
0 0 1
1 2 3
2 3 1
4 3 2
3 4 1
0 0 4
4 3 3
2 0 4
;
Run;
data flipped;
set have;
array vars(3) $ Categorical1 Categorical2 Categorical3;
do i=1 to dim(vars);
Variable = vname(vars(i));
Level = vars(i);
Output;
end;
keep variable level;
run;
proc freq data=flipped noprint;
table variable*level/sparse out=want(where=(level='0') drop=percent pct_row rename=(pct_col=percent)) outpct;
run;
Hi:
In addition to Reeza's comment, can you explain what you mean, exactly, by this statement "Another problem is in my output that Frequency and Percent variable does not match with each other and double Categorical1 and Categorical2 rows." What do you mean by "double" the rows?
Also, have you checked the output that you're getting from PROC FREQ? It doesn't matter, but you have 2 table statements and the ODS OUTPUT statement. I don't think you are getting what you want from PROC FREQ.
And, finally, I don't think you would need your SQL if you just used TABULATE and PRELOADFMT. The use of TABULATE and PRELOADFMT is also covered in our Report Writing class in one of the Challenge Exercises in Chapter 3. Here's an example.
Report #4 is a variation that uses Reeza's FLIPPED data. However, there is no need for an output dataset from FREQ or for a PROC SQL.
Also, as a best practice recommenation and comment on your code -- generally, I discourage my students from doing things like this:
data want;
set want;
... more code ...;
run;
OR
proc sql;
create table want as
..... more code .....
from want
.... more code...;
quit;
I do not consider it good practice to use the exact same name for both the INPUT and OUTPUT datasets in a step.
proc format;
value $ctgfmt '0'='0'
'1'='1'
'2'='2'
'3'='3'
'4'='4';
run;
proc tabulate data=have missing;
title '1: using PRELOADFMT and PRINTMISS and PCTN';
class categorical1 categorical2 categorical3 / preloadfmt ;
table categorical1 all,n pctn / printmiss;
table categorical2 all,n pctn / printmiss;
table categorical3 all,n pctn / printmiss;
format categorical1 categorical2 categorical3 $ctgfmt.;
run;
proc tabulate data=have missing;
title '2: using PRELOADFMT and PRINTMISS and COLPCTN';
class categorical1 categorical2 categorical3 / preloadfmt ;
table categorical1 all,n colpctn / printmiss;
table categorical2 all,n colpctn / printmiss;
table categorical3 all,n colpctn / printmiss;
format categorical1 categorical2 categorical3 $ctgfmt.;
run;
** for this table PCTN #1 and COLPCTN #2 generate the same values;
proc tabulate data=have missing;
title '3: all 3 categorical variables in one table in ROW dimension';
class categorical1 categorical2 categorical3 / preloadfmt ;
table (categorical1 all) (categorical2 all) (categorical3 all),n colpctn / printmiss;
format categorical1 categorical2 categorical3 $ctgfmt.;
run;
** 4 variation on suggestion by Reeza;
data flipped;
set have;
array vars(3) $ Categorical1 Categorical2 Categorical3;
do i=1 to dim(vars);
Variable = vname(vars(i));
Level = vars(i);
Output;
end;
keep variable level;
run;
proc tabulate data=flipped missing;
title '4: all 3 categorical variables in one table do not need PRELOADFMT';
class variable level / ;
table variable*level,
n pctn/ printmiss;
run;
title;
cynthia
Hi,
Reeza,
Your code gave me my desired output, thank you 🙂
Can I get brief information about SPARSE and OUTPCT options to understand the logic better?
Cynthia,
I tried to say that I have two same rows instead of one row then I realized that I put additional "table" statement into my PROC FREQ statement. You've also already mentioned about this situation in your response, thank you for your attention.
Thank you for the PRELOADFMT. examples, I understood better, thank you 🙂
I wiil also consider to not use the exact same name for both the INPUT and OUTPUT datasets in a step.
Thank you,
Reeza,
I forgot the tell, variable number can change into Have data set, is it possible to provide dynamic structure for your Flipped step. If column count is more than three, how can change it automatically?
Thank you
Change your array definition, the remaining code will stay the same.
Thank you Reeza,
Following code seems dynamic
%Let Variable = Categorical1 Categorical2 Categorical3;
%Let word_var=%sysfunc(countw(&Variable));
Data Flipped;
Set Have;
Array Vars(&word_var) $ &Variable;
Do i=1 To Dim(Vars);
/*VName -> Assigns a variable name as the value of a specified variable.*/
Variable = VName(Vars(i));
Level = Vars(i);
Output;
End;
Keep Variable Level;
Run;
/*I'm adding one more column here*/
Data Flipped2;
Set Flipped;
Length Dataset $ 32;
Dataset="Development";
Run;
Lasty, is it possible to add one more variable in the output of PROC FREQ statement.
I tried KEEP option instead of DROP option but it doesn't work
Proc Freq Data=Flipped2 Noprint;
Table Variable*Level/Sparse Out=Want(Where=(Level='0')
Keep=Dataset Variable Level Count Pct_col Rename=(Pct_col=Percent)) Outpct;
Run;
Or I' need to add one more Data Step?
Thank you,
Variable dataset will not be in the output dataset with that code so a keep won't help. If you use Keep you have to list all variables you want to keep. Typically only Keep or drop are used, not both.
You need to change the proc freq to include dataset variable or add it in afterwards. You can also create the variable in the first data step you don't need a second one - it will slow down your processing and is inefficient.
For the proc freq method try adding it as a BY variable or in your table statement.
Yes, this is what I want, Thank you very much 🙂
Proc Freq Data=Flipped2 Noprint;
Table Variable*Level/Sparse Out=Want(Where=(Level='0')
Keep=Dataset Variable Level Count Pct_col Rename=(Pct_col=Percent)) Outpct;
BY Dataset;
Run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.