BookmarkSubscribeRSS Feed
JackZ295
Pyrite | Level 9

If I have a dataset in which respondents are answering questions according to a  5-point Likert Scale, and I run a proc freq on those questions, I would usually get an output that looks like this 

 

Likert-Question 1   Frequency   Percent   Cumulative Frequency  Cumulative Percent 

1                                  10

2                                  10

3                                  10

4                                  10

5                                  10

 

 

 

However, if none of the respondents chose a particular choice, it seems that proc freq omits that choice in the frequency count all together, rather than providing that choice and indicating that the frequency is 0. See below for what I mean: 

 

Likert-Question 1   Frequency   Percent   Cumulative Frequency  Cumulative Percent 

2                                  20

4                                  20

5                                  10

 

Is there a way to make it so that proc freq displays all choices, regardless of whether or not respondents selected that choice? See below for what I am looking for: 

 

Likert-Question 1   Frequency   Percent   Cumulative Frequency  Cumulative Percent 

1                                  0

2                                  20

3                                  0

4                                  20

5                                  10

 

I basically want the proc freq to show all choices, even if the frequency is 0, and to indicate that the frequency for that particular choice is 0. Is that possible? If so how would I go about doing so? 

 

 

 

 

17 REPLIES 17
PaigeMiller
Diamond | Level 26

I don't think you can force PROC FREQ to do this directly. To get the results you want, you need either a data step to modify the results of PROC FREQ; or by creating a different input data set that contains all the levels, some of which appear with a weight of 0 (which is untested, I haven't actually tried to use a weight of 0 to see what PROC FREQ would do).

--
Paige Miller
Ksharp
Super User
data level;
 sex='F'; w=0;output;
 sex='M'; w=0;output;
 sex='C'; w=0;output;
 sex='U'; w=0;output;
run;


data class;
 set sashelp.class;
 w=1;
 keep sex w;
run;
data class;
 set class level;
run;

proc freq data=class ;
table sex/missprint;
weight w/zeros;
run;
Sex	频数	百分比	累积
频数	累积
百分比
C	0	0.00	0	0.00
F	9	47.37	9	47.37
M	10	52.63	19	100.00
U	0	0.00	19	100.00
JackZ295
Pyrite | Level 9

Hi @Ksharp , thanks for your help. Would you mind explaining what you did? Where did the sashelp.class data set come from? Also, what does the data step

 

data class;
 set class level;
run;

 do? 

Thanks again.

PaigeMiller
Diamond | Level 26

SASHELP.CLASS is a test data set that every SAS user can access, and is used as an illustration of how to solve your problem.

 

The code you show concatenates (vertically) the actual data in CLASS with the levels data set (which contains all possible levels) in LEVELS.

--
Paige Miller
JackZ295
Pyrite | Level 9

Hi @PaigeMiller, thank you for your help. In my code, would I have to make use of this sashelp.class data set in order to implement this solution? Or could I change it to a different data set? 

PaigeMiller
Diamond | Level 26

You use your own data set.

--
Paige Miller
JackZ295
Pyrite | Level 9

Hi @PaigeMiller, in the first data step-

 

data level;
 sex='F'; w=0;output;
 sex='M'; w=0;output;
 sex='C'; w=0;output;
 sex='U'; w=0;output;
run;

does this first step involve having me create a new temporary data set such that the variable has all of the choices I want it to have, regardless of whether or not the frequency is zero? Or is this first data set in the series of steps he presented being created from an old data set that I have? In other words, for my first step for making this work, would I write it as

 

data level; 

nA1Q1='1'; w=0;output;

nA2Q1='2'; w=0;output;

nA3Q1='3'; w=0;output;

nA4Q1='4'; w=0;output;

nA5Q1='5'; w=0;output;

run; 

 

or would I write it as

data level;

set four;  

nA1Q1='1'; w=0;output;

nA2Q1='2'; w=0;output;

nA3Q1='3'; w=0;output;

nA4Q1='4'; w=0;output;

nA5Q1='5'; w=0;output;

run; 

Thanks again. 

ballardw
Super User

One way is to use a procedure that supports PRELOADFMT using a format created to show all the values.

 

Proc format library=work;
value lickert
1 = '1'
2 = '2'
3 = '3'
4 = '4'
5 = '5'
;
;

data example;
   input v1 v2 v3;
datalines;
1 4 2 3
1 2 3 4
1 3 4 1
4 1 4 2
;
run;

proc tabulate data=example;
   class v1 v2 v3/ missing preloadfmt;
   format v1 v2 v3 lickert.;
   table v1 v2 v3 ,
         n pctn
        / printmiss misstext='0'
   ;
run;

However you would need to do something else if you want cumulative frequencies and percentages  such as sending the result above to a data set and using a date step to calculate the cumulative totals. Note: the format of a Proc Tabulate data set is a bit different than proc freq and you need to look at it very closely to get what you want.

JackZ295
Pyrite | Level 9

Hi @ballardw , thanks for your help. Could I use the solution that @Ksharp proposed? proc tabulate may get a little tricky because I need to output these frequencies later. Does the proc freq procedure work? Also, when I am implementing this solution, would I also have to use the sashelp.class data set? 

ballardw
Super User

@JackZ295 wrote:

Hi @ballardw , thanks for your help. Could I use the solution that @Ksharp proposed? proc tabulate may get a little tricky because I need to output these frequencies later. Does the proc freq procedure work? Also, when I am implementing this solution, would I also have to use the sashelp.class data set? 


@Ksharp solution should work if your data is structured in a similar to manner to the assumed set. The use of SASHELP.CLASS was just to have some data to demonstrate the behavior of the program. That example ensures that each variable has at least one records with the needed value and the properties of the WEIGHT statement in Proc Freq. This approach requires a separate data set that contains the values of the variables that don't exist in your current data, or likely easiest to keep track of all the values.

You could make your level data set with something like which makes 4 variables named q1 to q4 with values of 1 to 5 and the weight of 0.  It would be up to you to insure the weight variable is added to your existing data.

data  level;
/*array a{*} <list your lickert variable names here>*/

array a{*} q1-q4;
do lickert=1 to 5;
   do i= 1 to dim(a);
      a[i]= lickert;
   end;
   w=0;
output; end; drop lickert i; run;

 

Proc tabulate can create output data sets but the way it works the cumulative would have to be done by sending the result through a data step and some folks don't like the format of the tabulate output data sets.

 

You should really supply examples of your data with questions.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

JackZ295
Pyrite | Level 9

Hi @ballardw, @PaigeMiller , @Ksharp thank you for your help. Here is some sample data: 

 

data one; 

input response_id $ A1Q1 A2Q1 RFQ1 SE1Q1 SE2Q1 SE3Q1 SE4Q1 I1Q1 I2Q1 I3Q1 I4Q1; 
datalines;
1 2 4 2 4 2 4 2 4 2 4 2
2 1 3 5 1 3 5 1 3 5 1 3 
3 1 2 3 1 2 3 1 2 3 1 2 
4 1 2 3 4 5 1 2 3 4 5 1
5 3 5 5 5 5 5 5 5 5 5 4
6 1 1 1 1 1 1 2 2 2 2 2
7 1 1 1 1 1 1 1 1 1 1 3
8 3 1 2 3 1 2 4 1 1 1 4
9 5 5 5 5 4 3 2 3 4 5 5
;
run; 
proc print data=one; 
run; 

I want the frequencies for the Likert scale levels 1 through 5 to be shown, even if the frequency for one or more of these levels is 0. This is a much abbreviated version of my data set, as there are 50 sets of these 11 variables A1Q1-I4Q1. The second set would be A1Q2-I4Q2 etc. Any advice? 

Ksharp
Super User

OK. How about this one :

 



data one; 
input response_id $ A1Q1 A2Q1 RFQ1 SE1Q1 SE2Q1 SE3Q1 SE4Q1 I1Q1 I2Q1 I3Q1 I4Q1; 
datalines;
1 2 4 2 4 2 4 2 4 2 4 2
2 1 3 5 1 3 5 1 3 5 1 3 
3 1 2 3 1 2 3 1 2 3 1 2 
4 1 2 3 4 5 1 2 3 4 5 1
5 3 5 5 5 5 5 5 5 5 5 4
6 1 1 1 1 1 1 2 2 2 2 2
7 1 1 1 1 1 1 1 1 1 1 3
8 3 1 2 3 1 2 4 1 1 1 4
9 5 5 5 5 4 3 2 3 4 5 5
;
run;

%macro likert(var=);
data level;
 do &var=1 to 5;
   output;
 end;
run;
data temp;
 set one(in=ina) level;
 w=ina;
 keep &var w;
run;
proc freq data=temp ;
table &var/missprint;
weight w/zeros;
run;
%mend;

%likert(var=A1Q1)
%likert(var=I4Q1)
ballardw
Super User

@JackZ295 wrote:

Hi @ballardw, @PaigeMiller , @Ksharp thank you for your help. Here is some sample data: 

 

data one; 

input response_id $ A1Q1 A2Q1 RFQ1 SE1Q1 SE2Q1 SE3Q1 SE4Q1 I1Q1 I2Q1 I3Q1 I4Q1; 
datalines;
1 2 4 2 4 2 4 2 4 2 4 2
2 1 3 5 1 3 5 1 3 5 1 3 
3 1 2 3 1 2 3 1 2 3 1 2 
4 1 2 3 4 5 1 2 3 4 5 1
5 3 5 5 5 5 5 5 5 5 5 4
6 1 1 1 1 1 1 2 2 2 2 2
7 1 1 1 1 1 1 1 1 1 1 3
8 3 1 2 3 1 2 4 1 1 1 4
9 5 5 5 5 4 3 2 3 4 5 5
;
run; 
proc print data=one; 
run; 

I want the frequencies for the Likert scale levels 1 through 5 to be shown, even if the frequency for one or more of these levels is 0. This is a much abbreviated version of my data set, as there are 50 sets of these 11 variables A1Q1-I4Q1. The second set would be A1Q2-I4Q2 etc. Any advice? 


So, you have 550 variables in your data set?

Not really interested in even attempting to write code for 550 variables. Variable lists can reduce the amount of code to possibly something manageable but we need information. Such as do you have any variable names that start with A1Q that are not variables that you want to process? Same for A2Q, RFQ SE1Q SE2Q? Do each of your "sets" have the exact same names only varying by the Q1, Q2 , … Q50? Or are all of the numeric variables in your data Likert variables?

 

Extending @Ksharp's solution

data one; 
input response_id $ A1Q1 A2Q1 RFQ1 SE1Q1 SE2Q1 SE3Q1 SE4Q1 I1Q1 I2Q1 I3Q1 I4Q1; 
datalines;
1 2 4 2 4 2 4 2 4 2 4 2
2 1 3 5 1 3 5 1 3 5 1 3 
3 1 2 3 1 2 3 1 2 3 1 2 
4 1 2 3 4 5 1 2 3 4 5 1
5 3 5 5 5 5 5 5 5 5 5 4
6 1 1 1 1 1 1 2 2 2 2 2
7 1 1 1 1 1 1 1 1 1 1 3
8 3 1 2 3 1 2 4 1 1 1 4
9 5 5 5 5 4 3 2 3 4 5 5
;
run; 

data level  ;
   array a{*} A1Q1 A2Q1 RFQ1 SE1Q1 SE2Q1 SE3Q1 SE4Q1 I1Q1 I2Q1 I3Q1 I4Q1;
   do lickert=1 to 5;
      do i= 1 to dim(a);
         a[i]= lickert;
      end;
      w=0;
      output;
   end;
   drop lickert i;
run;

data forfreq;
  set one (in=in1)
      level;
  if in1 then w=1;
run;

proc freq data=forfreq;
   table A1Q1 A2Q1 RFQ1 SE1Q1 SE2Q1 SE3Q1 SE4Q1 I1Q1 I2Q1 I3Q1 I4Q1;
   weight w/zeros;
run;

How ever when you said "I need to output these frequencies later" you need to describe HOW you need to use them. If you need a data set for each variable count then you need 550 data sets. Which going to be a tad cumbersome.

Or you can use

proc freq data=forfreq;
   table A1Q1 A2Q1 RFQ1 SE1Q1 SE2Q1 SE3Q1 SE4Q1 I1Q1 I2Q1 I3Q1 I4Q1;
   weight w/zeros;
   ods output onewayfreqs= freqset;
run;

to place all of the output frequencies into a single table but you will really want to look at the structure of that data set carefully for most uses. (Which ends up moderately similar to the Proc Tabulate data set but does have the cumulative counts and percentages)

 

You can create variable lists such as A1Q:  which would reference all names that start with those 3 letters, or A1Q1- A1Q50 to reference all the variables that start with A1Q and iterate a numeric value from 1 to 50 at the end. But we would need to know the names of the variables you need if these patterns aren't acceptable.

 

Here is an approach that attempts to use variable lists to make a level data set with  "50 sets" of the first three variable names you show:

data level  ;
   array a{*} A1Q1-A1Q50 A2Q1-A2Q50 RFQ1-RFQ50;
   do lickert=1 to 5;
      do i= 1 to dim(a);
         a[i]= lickert;
      end;
      w=0;
      output;
   end;
   drop lickert i;
run;

If that makes the proper named variables then continue the pattern to generate all of your variables in the levels data set.

JackZ295
Pyrite | Level 9

Hi @ballardw, thank you for all of your help. I don't blame you. I have been using macros for all of my code, and even with that I will say that it is quite cumbersome. When I first posted on this forum, I was just looking for a skeleton set of code that I could apply a macro to for all of my variables. I will try your solution and give more context to what I am looking for later. I will also be sure to ask more questions if I run into issues. Thanks again! 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 1089 views
  • 3 likes
  • 4 in conversation