DATA Step, Macro, Functions and more

Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

Accepted Solution Solved
Reply
Super Contributor
Posts: 381
Accepted Solution

Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

Hello everyone,

 

I’m trying to create the following desired output. My input data set has categorical variables, I need to get “0” values count and their percentage based on each variable. To create my desired output, I wrote the following code but as you can see for the Categorical3 variable, if the variable doesn’t include “0” values so it doesn’t bring any value or percentage in PROC FREQ statement.

 

Another problem is in my output that Frequency and Percent variable does not match with each other and double Categorical1 and Categorical2 rows.

 

My desired output and my code is  as below, is anybody can help to reach my desired output, please?

 

My Code;

DATA Have;
Length Categorical1 $ 20 Categorical2 $ 20 Categorical3 $ 20;
Infile Datalines Missover ;
Input Categorical1 Categorical2 Categorical3;
Datalines;
0 0 1
1 2 3
2 3 1
4 3 2
3 4 1
0 0 4
4 3 3
2 0 4
;
Run;
Ods Trace On;
PROC FREQ Data=Have;
Tables Categorical1 Categorical2 Categorical3 / NoCum Missing;
Table Categorical1 Categorical2 Categorical3;
Ods Output OneWayFreqs=Want;
Run;
Ods Trace Off;
Proc Sql;
Create Table Want As
Select Substr(Table,7) As Variable,
LEFT(Trim(F_Categorical1)||""||F_Categorical2||""||F_Categorical3) As Level,
Frequency, Percent
From Want 
Where Calculated Level="0";;
Quit;

 

My Code Output;

 

CurrentOutput.png

 

Desired Output;

 

DesiredOutput.png

 

Thank you,

 


Accepted Solutions
Solution
‎05-30-2016 06:26 PM
Super User
Posts: 17,826

Re: Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

There are two ways to accomplish this, one is to create the list of values ahead of time via a format and use PRELOADFMT in proc tabulate since proc freq does not support PRELOADFMT.

Here's a paper on how to accomplish this, see example #3. If you search PRELOADFMT on here, you'll find many examples as well.

http://www.lexjansen.com/nesug/nesug11/cc/cc29.pdf

 

Another is to use the SPARSE option in proc freq after you've transposed your data.

 

DATA Have;
Length Categorical1 $ 20 Categorical2 $ 20 Categorical3 $ 20;
Infile Datalines Missover ;
Input Categorical1 Categorical2 Categorical3;
Datalines;
0 0 1
1 2 3
2 3 1
4 3 2
3 4 1
0 0 4
4 3 3
2 0 4
;
Run;

data flipped;
set have;

array vars(3) $ Categorical1 Categorical2 Categorical3;

do i=1 to dim(vars);

Variable = vname(vars(i));
Level = vars(i);

Output;
end;

keep variable level;
run;

proc freq data=flipped noprint;
table variable*level/sparse out=want(where=(level='0') drop=percent pct_row rename=(pct_col=percent)) outpct;
run;

 

 

View solution in original post


All Replies
Solution
‎05-30-2016 06:26 PM
Super User
Posts: 17,826

Re: Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

There are two ways to accomplish this, one is to create the list of values ahead of time via a format and use PRELOADFMT in proc tabulate since proc freq does not support PRELOADFMT.

Here's a paper on how to accomplish this, see example #3. If you search PRELOADFMT on here, you'll find many examples as well.

http://www.lexjansen.com/nesug/nesug11/cc/cc29.pdf

 

Another is to use the SPARSE option in proc freq after you've transposed your data.

 

DATA Have;
Length Categorical1 $ 20 Categorical2 $ 20 Categorical3 $ 20;
Infile Datalines Missover ;
Input Categorical1 Categorical2 Categorical3;
Datalines;
0 0 1
1 2 3
2 3 1
4 3 2
3 4 1
0 0 4
4 3 3
2 0 4
;
Run;

data flipped;
set have;

array vars(3) $ Categorical1 Categorical2 Categorical3;

do i=1 to dim(vars);

Variable = vname(vars(i));
Level = vars(i);

Output;
end;

keep variable level;
run;

proc freq data=flipped noprint;
table variable*level/sparse out=want(where=(level='0') drop=percent pct_row rename=(pct_col=percent)) outpct;
run;

 

 

SAS Super FREQ
Posts: 8,743

Re: Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

Hi:

  In addition to Reeza's comment, can you explain what you mean, exactly, by this statement "Another problem is in my output that Frequency and Percent variable does not match with each other and double Categorical1 and Categorical2 rows." What do you mean by "double" the rows?

 

  Also, have you checked the output that you're getting from PROC FREQ? It doesn't matter, but you have 2 table statements and the ODS OUTPUT statement. I don't think you are getting what you want from PROC FREQ.

 

  And, finally, I don't think you would need your SQL if you just used TABULATE and PRELOADFMT. The use of TABULATE and PRELOADFMT is also covered in our Report Writing class in one of the Challenge Exercises in Chapter 3. Here's an example.

Report #4 is a variation that uses Reeza's FLIPPED data. However, there is no need for an output dataset from FREQ or for a PROC SQL. 

 

  Also, as a best practice recommenation and comment on your code -- generally, I discourage my students from doing things like this:

data want;

    set want;

... more code ...;

run;

 

OR

proc sql;

  create table want as

    ..... more code .....

  from want 

 .... more code...;

quit;

 

  I do not consider it good practice to use the exact same name for both the INPUT and OUTPUT datasets in a step.

 


proc format;
  value $ctgfmt '0'='0'
                '1'='1'
	        '2'='2'
	        '3'='3'
	        '4'='4';
run;
  
proc tabulate data=have missing;
title '1: using PRELOADFMT and PRINTMISS and PCTN';
  class categorical1 categorical2 categorical3 / preloadfmt ;
  table categorical1 all,n pctn / printmiss;
  table categorical2 all,n pctn / printmiss;
  table categorical3 all,n pctn / printmiss;
  format categorical1 categorical2 categorical3 $ctgfmt.;
run;

proc tabulate data=have missing;
  title '2: using PRELOADFMT and PRINTMISS and COLPCTN';
  class categorical1 categorical2 categorical3 / preloadfmt ;
  table categorical1 all,n colpctn / printmiss;
  table categorical2 all,n colpctn / printmiss;
  table categorical3 all,n colpctn / printmiss;
  format categorical1 categorical2 categorical3 $ctgfmt.;
run;

** for this table PCTN #1 and COLPCTN #2 generate the same values;

proc tabulate data=have missing;
  title '3: all 3 categorical variables in one table in ROW dimension';
  class categorical1 categorical2 categorical3 / preloadfmt ;
  table (categorical1 all) (categorical2 all) (categorical3 all),n colpctn / printmiss;
  format categorical1 categorical2 categorical3 $ctgfmt.;
run;

** 4 variation on suggestion by Reeza;
data flipped;
set have;
array vars(3) $ Categorical1 Categorical2 Categorical3;
do i=1 to dim(vars);
  Variable = vname(vars(i));
  Level = vars(i);
  Output;
end;
keep variable level;
run;
 
proc tabulate data=flipped missing;
  title '4: all 3 categorical variables in one table do not need PRELOADFMT';
  class variable level /  ;
  table variable*level,
        n pctn/ printmiss;
run;
  
title;

cynthia

Super Contributor
Posts: 381

Re: Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

Hi,

 

Reeza,

 

Your code gave me my desired output, thank you Smiley Happy

Can I get brief information about SPARSE and OUTPCT options to understand the logic better?

 

Cynthia,

 

I tried to say that I have two same rows instead of one row then I realized that I put additional "table" statement into my PROC FREQ statement. You've also already mentioned about this situation in your response, thank you for your attention.

 

Thank you for the PRELOADFMT. examples, I understood better, thank you Smiley Happy

 

I wiil also consider to not use the exact same name for both the INPUT and OUTPUT datasets in a step.

 

Thank you,

 

Super Contributor
Posts: 381

Re: Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

Reeza,

 

I forgot the tell, variable number can change into Have data set, is it possible to provide dynamic structure for your Flipped step. If column count is more than three, how can change it automatically?

 

Thank you

Super User
Posts: 17,826

Re: Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

Change your array definition, the remaining code will stay the same. 

Super Contributor
Posts: 381

Re: Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

Thank you Reeza,

 

Following code seems dynamic

 

%Let Variable = Categorical1 Categorical2 Categorical3;
%Let word_var=%sysfunc(countw(&Variable));


Data Flipped;
Set Have;

Array Vars(&word_var) $ &Variable;

Do i=1 To Dim(Vars);
/*VName -> Assigns a variable name as the value of a specified variable.*/
Variable = VName(Vars(i));
Level = Vars(i);

Output;
End;
Keep Variable Level;
Run;

/*I'm adding one more column here*/

Data Flipped2;
Set Flipped;
Length Dataset $ 32;
Dataset="Development";
Run;

Lasty, is it possible to add one more variable in the output of PROC FREQ statement.

 

I tried KEEP option instead of DROP option but it doesn't work

Proc Freq Data=Flipped2 Noprint;
Table Variable*Level/Sparse Out=Want(Where=(Level='0')
Keep=Dataset Variable Level Count Pct_col Rename=(Pct_col=Percent)) Outpct;
Run;

Or I' need to add one more Data Step?

 

Thank you,

 

Super User
Posts: 17,826

Re: Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

Variable dataset will not be in the output dataset with that code so a keep won't help. If you use Keep you have to list all variables you want to keep. Typically only Keep or drop are used, not both.  

 

You need to change the proc freq to include dataset variable or add it in afterwards. You can also create the variable in the first data step you don't need a second one - it will slow down your processing and is inefficient. 

 

For the proc freq method try adding it as a BY variable or in your table statement. 

 

 

 

Super Contributor
Posts: 381

Re: Get Number of Zero Values and Get Percentage of Zero Values based on Variable in One Data Set

Yes, this is what I want, Thank you very much Smiley Happy

 

Proc Freq Data=Flipped2 Noprint;
Table Variable*Level/Sparse Out=Want(Where=(Level='0')
Keep=Dataset Variable Level Count Pct_col Rename=(Pct_col=Percent)) Outpct;
BY Dataset;
Run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 531 views
  • 6 likes
  • 3 in conversation