Hello community members,
I am fairly novice SAS user and I am seeking help in perform certain analysis. To begin with, I have a dataset from a survey that asked participants to rate statements on a 5-point Likert scale. The dataset has about 80 variables representing the statements and variables on demographics (gender, age, race, etc.) and there are about 800 observations.
Here is an example of the dataset. Due to certain privacy concern, I can not share the actual dataset:
Observations | Q1 | Q2 | Q3 | Q4 | Q5 |
1 | Strongly Agree | Agree | Strongly Disagree | Disagree | Neutral |
2 | Agree | Agree | Disagree | Neutral | Neutral |
3 | Strongly Agree | Strongly Disagree | Agree | Strongly Disagree | Agree |
4 | Disagree | Agree | Neutral | Strongly Agree | Disagree |
5 | Disagree | Agree | Disagree |
|
|
6 | Strongly Disagree | Strongly Agree | Strongly Disagree | Neutral | Disagree |
7 | Agree | Neutral | Agree | Agree | Agree |
8 | Strongly Disagree | Strongly Disagree | Disagree | Strongly Disagree | Neutral |
9 | Neutral | Strongly Disagree | Strongly Agree | Neutral | Agree |
10 | Agree | Strongly Agree | Strongly Agree | Agree | Agree |
What I am interested in doing is combining some of these variables (i.e. Q1, Q3, and Q5) together to get an average frequency and percentage of observed responses. For example, the average frequency of responses across variable Q1, Q3, and Q5 that indicated they "Strongly Agree" is 4 observed responses out of 30 total responses which is about 13%.
Conceptually, I understand how to do this, but not on SAS. I've tried Proc Freq to obtain frequency and percentage counts for one variable, but was unable to get the counts across more than one variable. Also, I am interested in cross tabbing the average count with the demographic variable as well, which seem to pose even more complication.
I hope this makes sense. Thanks for any help.
@sasyi wrote:
Hi,
Thank you for the suggestion. I failed to mention that the text code were just label and the values are actually coded numerically from the scale 1, 2, 3, 4, 5 and "." for missing. The example table I provided in my original post is just a simple example of what my actual dataset looks like hence why it is easy to do by hand. My actual dataset has more variables (80 variables) and observation (over 800).
I'm not really to sure what a result for this would look like on SAS, but I would imagine something similar to a Proc Freq table, where instead of being a table for just one variable (Q1), it's a combination of three variable or more (Q1, Q3, Q5).
(Q1, Q3, Q5) Frequency Percent CumulativeFrequency CumulativePercentFrequency Missing = Strongly AgreeAgreeNeutralDisagreeStrongly Disagree
Q1,Q2,Q3 Freq Percent Strongly Agree Agree Neutral Disagree Strongly Disagree
I hope this make sense and I apologize for any misinterpretation. Thanks again for the help.
The reason I asked about working through an example is the phrase "combination of variables". That could mean a new variable that combines the variables per respondent such as mean of the 3, or concatenated values of the three or something else.
Don't be afraid to subset and rearrange data to do analysis. I think you are looking for something like this in small:
DATA HAVE; INPUT q1 - q4; datalines; 1 2 3 3 1 1 4 4 2 4 2 5 ; run; data need; set have; array a q1-q3; length name $ 10; do i=1 to dim(a); name=vname(a[i]); value=a[i]; output; end; keep name value; label value='Vars Q1, Q2, Q2'; run; proc freq data=need; tables value; run;
The array definition will subset the data to the variables of interest.
Or you could transpose of the questionnumeric variables this way. Then use Proc Freq with a
Where upcase(name) in ('Q1' 'Q3' 'Q5');
To get combined frequencies of select names;
Or create format for the value of the name variable to do multiple groups at once:
data need; set have ; array a q1-q4; length name $ 10; do i=1 to dim(a); name=vname(a[i]); value=a[i]; output; end; keep name value; run; proc format library=work; value $vargrp 'q1','q2' = 'Q1 Q2' 'q3','q4' = 'Q3 Q4' ; run; proc tabulate data=need; class name; class value; format name $vargrp.; table value, name*(n pctn) /misstext=' ' ; run;
I suggest that you show what you want as a result given that data. That is small enough that you should be able to do that by hand. Otherwise we're going to spend a lot of time guessing what combinations you want to consider and what the result should be.
Here's a hint if you think will do similar projects frequently. Do not use text codes for limited choices but create/ read fixed responses, such as from the apparent list you use into a numeric value with a known order of results.
One reason is that then questions about multiple variables having the same value are easy because you can use the RANGE function. A result of 0 would tell you that all variables used for arguments have the same value; easily find the maximum or minimum of a group of variables; get a mean of responses per respondent of multiple variables and more goodies.
You may want to research Likert Scale for more examples details
Here is an example of using a custom informat to read text values into numeric and assigning a custom format to display the original text.
proc format library=work; invalue agree "Strongly Disagree" = 1 "Disagree" = 2 "Neutral" = 3 "Agree" = 4 "Strongly Agree" = 5 " " = . other = _error_ ; value agree 1="Strongly Disagree" 2="Disagree" 3="Neutral" 4="Agree" 5="Strongly Agree" ; run; data example; infile datalines dlm='|' missover; informat q1-q2 agree.; format q1-q2 agree.; input q1 q2 ; datalines; Strongly Disagree|Strongly Agree Disagree |Strongly Disagree Neutral |Disagree Agree |Neutral Strongly Agree |Agree Strongly Agree |Strongly Agree Agree |Wrong Answer ;
This would be done using an INFILE statement pointing to a TEXT file of some sort and indicate the appropriate value separator as the DLM option.
You can also use an INPUT statement with the text valued variable to create a numeric version:
q1num = input(q1,agree.);
Another possibility depending on the software used to create the survey you may be able to export order code values instead of text.
Hi,
Thank you for the suggestion. I failed to mention that the text code were just label and the values are actually coded numerically from the scale 1, 2, 3, 4, 5 and "." for missing. The example table I provided in my original post is just a simple example of what my actual dataset looks like hence why it is easy to do by hand. My actual dataset has more variables (80 variables) and observation (over 800).
I'm not really to sure what a result for this would look like on SAS, but I would imagine something similar to a Proc Freq table, where instead of being a table for just one variable (Q1), it's a combination of three variable or more (Q1, Q3, Q5).
(Q1, Q3, Q5) Frequency Percent CumulativeFrequency CumulativePercentFrequency Missing = Strongly AgreeAgreeNeutralDisagreeStrongly Disagree
Q1,Q2,Q3 | Freq | Percent |
Strongly Agree | ||
Agree | ||
Neutral | ||
Disagree | ||
Strongly Disagree |
I hope this make sense and I apologize for any misinterpretation. Thanks again for the help.
I’d start by first making my data long rather than wide and adding in my formats for both questions and the values.
Then you can also combine questions together using formats and proc freq.
q1, q2, q3 all map to the same group so when you do the percentages it’s what you want.
@sasyi wrote:
Hi,
Thank you for the suggestion. I failed to mention that the text code were just label and the values are actually coded numerically from the scale 1, 2, 3, 4, 5 and "." for missing. The example table I provided in my original post is just a simple example of what my actual dataset looks like hence why it is easy to do by hand. My actual dataset has more variables (80 variables) and observation (over 800).
I'm not really to sure what a result for this would look like on SAS, but I would imagine something similar to a Proc Freq table, where instead of being a table for just one variable (Q1), it's a combination of three variable or more (Q1, Q3, Q5).
(Q1, Q3, Q5) Frequency Percent CumulativeFrequency CumulativePercentFrequency Missing = Strongly AgreeAgreeNeutralDisagreeStrongly Disagree
Q1,Q2,Q3 Freq Percent Strongly Agree Agree Neutral Disagree Strongly Disagree
I hope this make sense and I apologize for any misinterpretation. Thanks again for the help.
The reason I asked about working through an example is the phrase "combination of variables". That could mean a new variable that combines the variables per respondent such as mean of the 3, or concatenated values of the three or something else.
Don't be afraid to subset and rearrange data to do analysis. I think you are looking for something like this in small:
DATA HAVE; INPUT q1 - q4; datalines; 1 2 3 3 1 1 4 4 2 4 2 5 ; run; data need; set have; array a q1-q3; length name $ 10; do i=1 to dim(a); name=vname(a[i]); value=a[i]; output; end; keep name value; label value='Vars Q1, Q2, Q2'; run; proc freq data=need; tables value; run;
The array definition will subset the data to the variables of interest.
Or you could transpose of the questionnumeric variables this way. Then use Proc Freq with a
Where upcase(name) in ('Q1' 'Q3' 'Q5');
To get combined frequencies of select names;
Or create format for the value of the name variable to do multiple groups at once:
data need; set have ; array a q1-q4; length name $ 10; do i=1 to dim(a); name=vname(a[i]); value=a[i]; output; end; keep name value; run; proc format library=work; value $vargrp 'q1','q2' = 'Q1 Q2' 'q3','q4' = 'Q3 Q4' ; run; proc tabulate data=need; class name; class value; format name $vargrp.; table value, name*(n pctn) /misstext=' ' ; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.