BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasyi
Fluorite | Level 6

Hello community members,

 

I am fairly novice SAS user and I am seeking help in perform certain analysis. To begin with, I have a dataset from a survey that asked participants to rate statements on a 5-point Likert scale. The dataset has about 80 variables representing the statements and variables on demographics (gender, age, race, etc.) and there are about 800 observations. 

 

Here is an example of the dataset. Due to certain privacy concern, I can not share the actual dataset:

Observations

Q1

Q2

Q3

Q4

Q5

1

Strongly Agree

Agree

Strongly Disagree

Disagree

Neutral

2

Agree

Agree

Disagree

Neutral

Neutral

3

Strongly Agree

Strongly Disagree

Agree

Strongly Disagree

Agree

4

Disagree

Agree

Neutral

Strongly Agree

Disagree

5

Disagree

Agree

Disagree

 

 

6

Strongly Disagree

Strongly Agree

Strongly Disagree

Neutral

Disagree

7

Agree

Neutral

Agree

Agree

Agree

8

Strongly Disagree

Strongly Disagree

Disagree

Strongly Disagree

Neutral

9

Neutral

Strongly Disagree

Strongly Agree

Neutral

Agree

10

Agree

Strongly Agree

Strongly Agree

Agree

Agree

 

What I am interested in doing is combining some of these variables (i.e. Q1, Q3, and Q5) together to get an average frequency and percentage of observed responses. For example, the average frequency of responses across variable Q1, Q3, and Q5 that indicated they "Strongly Agree" is 4 observed responses out of 30 total responses which is about 13%. 

 

Conceptually, I understand how to do this, but not on SAS. I've tried Proc Freq to obtain frequency and percentage counts for one variable, but was unable to get the counts across more than one variable. Also, I am interested in cross tabbing the average count with the demographic variable as well, which seem to pose even more complication. 

 

I hope this makes sense. Thanks for any help. 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@sasyi wrote:

Hi,

 

Thank you for the suggestion. I failed to mention that the text code were just label and the values are actually coded numerically from the scale 1, 2, 3, 4, 5 and "." for missing. The example table I provided in my original post is just a simple example of what my actual dataset looks like hence why it is easy to do by hand. My actual dataset has more variables (80 variables) and observation (over 800). 

 

I'm not really to sure what a result for this would look like on SAS, but I would imagine something similar to a Proc Freq table, where instead of being a table for just one variable (Q1), it's a combination of three variable or more (Q1, Q3, Q5). 

  (Q1, Q3, Q5) Frequency Percent CumulativeFrequency CumulativePercentFrequency Missing = Strongly AgreeAgreeNeutralDisagreeStrongly Disagree

Q1,Q2,Q3 Freq Percent
Strongly Agree    
Agree    
Neutral    
Disagree    
Strongly Disagree    

 

 

I hope this make sense and I apologize for any misinterpretation. Thanks again for the help. 


The reason I asked about working through an example is the phrase "combination of variables". That could mean a new variable that combines the variables per respondent such as mean of the 3, or concatenated values of the three or something else.

 

Don't be afraid to subset and rearrange data to do analysis. I think you are looking for something like this in small:

DATA HAVE;
INPUT q1 - q4;
datalines;
1 2 3 3
1 1 4 4
2 4 2 5
;
run; 

data need; 
	set have;
	array a q1-q3;
	length name $ 10;
	do i=1 to dim(a);
		name=vname(a[i]);
		value=a[i];
		output;
	end;
	keep name value;
    label value='Vars Q1, Q2, Q2';
run;

proc freq data=need;
   tables value;
run;


The array definition will subset the data to the variables of interest.

 

Or you could transpose of the questionnumeric variables this way. Then use Proc Freq with a
Where upcase(name) in ('Q1' 'Q3' 'Q5');

To get combined frequencies of select names;

Or create format for the value of the name variable to do multiple groups at once:

data need; 
	set have ;
	array a q1-q4;
	length name $ 10;
	do i=1 to dim(a);
		name=vname(a[i]);
		value=a[i];
		output;
	end;
	keep name value;
run;

proc format library=work;
	value $vargrp
	'q1','q2' = 'Q1 Q2'
	'q3','q4' = 'Q3 Q4'
	;

run;

proc tabulate data=need;
	class name;
	class value;
	format name $vargrp.;
	table value,
	      name*(n pctn)
		  /misstext=' '
    ;

run;


View solution in original post

4 REPLIES 4
ballardw
Super User

I suggest that you show what you want as a result given that data. That is small enough that you should be able to do that by hand. Otherwise we're going to spend a lot of time guessing what combinations you want to consider and what the result should be.

 

Here's a hint if you think will do similar projects frequently. Do not use text codes for limited choices but create/ read fixed responses, such as from the apparent list you use into a numeric value with a known order of results.

One reason is that then questions about multiple variables having the same value are easy because you can use the RANGE function. A result of 0 would tell you that all variables used for arguments have the same value; easily find the maximum or minimum of a group of variables; get a mean of responses per respondent of multiple variables and more goodies.

You may want to research Likert Scale for more examples details

 

Here is an example of using a custom informat to read text values into numeric and assigning a custom format to display the original text.

proc format library=work;
invalue agree
"Strongly Disagree" = 1
"Disagree"          = 2
"Neutral"           = 3
"Agree"             = 4
"Strongly Agree"    = 5
" "                 = . 
other               = _error_
;
value agree
1="Strongly Disagree"
2="Disagree"         
3="Neutral"          
4="Agree"            
5="Strongly Agree"  
;
run;

data example;
	infile datalines dlm='|' missover;
	informat q1-q2 agree.;
	format   q1-q2 agree.;
    input q1 q2 ;

datalines;
Strongly Disagree|Strongly Agree   
Disagree         |Strongly Disagree
Neutral          |Disagree         
Agree            |Neutral          
Strongly Agree   |Agree            
Strongly Agree   |Strongly Agree 
Agree            |Wrong Answer 
;

This would be done using an INFILE statement pointing to a TEXT file of some sort and indicate the appropriate value separator as the DLM option.

 

You can also use an INPUT statement with the text valued variable to create a numeric version:

 

q1num = input(q1,agree.);

 

Another possibility depending on the software used to create the survey you may be able to export order code values instead of text.

 

 

 

sasyi
Fluorite | Level 6

Hi,

 

Thank you for the suggestion. I failed to mention that the text code were just label and the values are actually coded numerically from the scale 1, 2, 3, 4, 5 and "." for missing. The example table I provided in my original post is just a simple example of what my actual dataset looks like hence why it is easy to do by hand. My actual dataset has more variables (80 variables) and observation (over 800). 

 

I'm not really to sure what a result for this would look like on SAS, but I would imagine something similar to a Proc Freq table, where instead of being a table for just one variable (Q1), it's a combination of three variable or more (Q1, Q3, Q5). 

  (Q1, Q3, Q5) Frequency Percent CumulativeFrequency CumulativePercentFrequency Missing = Strongly AgreeAgreeNeutralDisagreeStrongly Disagree

Q1,Q2,Q3FreqPercent
Strongly Agree  
Agree  
Neutral  
Disagree  
Strongly Disagree  

 

 

I hope this make sense and I apologize for any misinterpretation. Thanks again for the help. 

Reeza
Super User

I’d start by first making my data long rather than wide and adding in my formats for both questions and the values. 
Then you can also combine questions together using formats and proc freq. 

 

q1, q2, q3 all map to the same group so when you do the percentages it’s what you want. 

 

ballardw
Super User

@sasyi wrote:

Hi,

 

Thank you for the suggestion. I failed to mention that the text code were just label and the values are actually coded numerically from the scale 1, 2, 3, 4, 5 and "." for missing. The example table I provided in my original post is just a simple example of what my actual dataset looks like hence why it is easy to do by hand. My actual dataset has more variables (80 variables) and observation (over 800). 

 

I'm not really to sure what a result for this would look like on SAS, but I would imagine something similar to a Proc Freq table, where instead of being a table for just one variable (Q1), it's a combination of three variable or more (Q1, Q3, Q5). 

  (Q1, Q3, Q5) Frequency Percent CumulativeFrequency CumulativePercentFrequency Missing = Strongly AgreeAgreeNeutralDisagreeStrongly Disagree

Q1,Q2,Q3 Freq Percent
Strongly Agree    
Agree    
Neutral    
Disagree    
Strongly Disagree    

 

 

I hope this make sense and I apologize for any misinterpretation. Thanks again for the help. 


The reason I asked about working through an example is the phrase "combination of variables". That could mean a new variable that combines the variables per respondent such as mean of the 3, or concatenated values of the three or something else.

 

Don't be afraid to subset and rearrange data to do analysis. I think you are looking for something like this in small:

DATA HAVE;
INPUT q1 - q4;
datalines;
1 2 3 3
1 1 4 4
2 4 2 5
;
run; 

data need; 
	set have;
	array a q1-q3;
	length name $ 10;
	do i=1 to dim(a);
		name=vname(a[i]);
		value=a[i];
		output;
	end;
	keep name value;
    label value='Vars Q1, Q2, Q2';
run;

proc freq data=need;
   tables value;
run;


The array definition will subset the data to the variables of interest.

 

Or you could transpose of the questionnumeric variables this way. Then use Proc Freq with a
Where upcase(name) in ('Q1' 'Q3' 'Q5');

To get combined frequencies of select names;

Or create format for the value of the name variable to do multiple groups at once:

data need; 
	set have ;
	array a q1-q4;
	length name $ 10;
	do i=1 to dim(a);
		name=vname(a[i]);
		value=a[i];
		output;
	end;
	keep name value;
run;

proc format library=work;
	value $vargrp
	'q1','q2' = 'Q1 Q2'
	'q3','q4' = 'Q3 Q4'
	;

run;

proc tabulate data=need;
	class name;
	class value;
	format name $vargrp.;
	table value,
	      name*(n pctn)
		  /misstext=' '
    ;

run;


SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1409 views
  • 3 likes
  • 3 in conversation