Solved: Re: Averaging Discrete Data Frequency and Percentage for Multiple Vari...

sasyi · Posted 03-20-2020 06:17 PM

Hello community members,

I am fairly novice SAS user and I am seeking help in perform certain analysis. To begin with, I have a dataset from a survey that asked participants to rate statements on a 5-point Likert scale. The dataset has about 80 variables representing the statements and variables on demographics (gender, age, race, etc.) and there are about 800 observations.

Here is an example of the dataset. Due to certain privacy concern, I can not share the actual dataset:

Observations	Q1	Q2	Q3	Q4	Q5
1	Strongly Agree	Agree	Strongly Disagree	Disagree	Neutral
2	Agree	Agree	Disagree	Neutral	Neutral
3	Strongly Agree	Strongly Disagree	Agree	Strongly Disagree	Agree
4	Disagree	Agree	Neutral	Strongly Agree	Disagree
5	Disagree	Agree	Disagree
6	Strongly Disagree	Strongly Agree	Strongly Disagree	Neutral	Disagree
7	Agree	Neutral	Agree	Agree	Agree
8	Strongly Disagree	Strongly Disagree	Disagree	Strongly Disagree	Neutral
9	Neutral	Strongly Disagree	Strongly Agree	Neutral	Agree
10	Agree	Strongly Agree	Strongly Agree	Agree	Agree

What I am interested in doing is combining some of these variables (i.e. Q1, Q3, and Q5) together to get an average frequency and percentage of observed responses. For example, the average frequency of responses across variable Q1, Q3, and Q5 that indicated they "Strongly Agree" is 4 observed responses out of 30 total responses which is about 13%.

Conceptually, I understand how to do this, but not on SAS. I've tried Proc Freq to obtain frequency and percentage counts for one variable, but was unable to get the counts across more than one variable. Also, I am interested in cross tabbing the average count with the demographic variable as well, which seem to pose even more complication.

I hope this makes sense. Thanks for any help.

ballardw · Posted 03-21-2020 04:36 AM

@sasyi wrote:

Hi,

Thank you for the suggestion. I failed to mention that the text code were just label and the values are actually coded numerically from the scale 1, 2, 3, 4, 5 and "." for missing. The example table I provided in my original post is just a simple example of what my actual dataset looks like hence why it is easy to do by hand. My actual dataset has more variables (80 variables) and observation (over 800).

I'm not really to sure what a result for this would look like on SAS, but I would imagine something similar to a Proc Freq table, where instead of being a table for just one variable (Q1), it's a combination of three variable or more (Q1, Q3, Q5).

(Q1, Q3, Q5) Frequency Percent CumulativeFrequency CumulativePercentFrequency Missing = Strongly AgreeAgreeNeutralDisagreeStrongly Disagree

Q1,Q2,Q3 Freq Percent

Strongly Agree

Agree

Neutral

Disagree

Strongly Disagree

I hope this make sense and I apologize for any misinterpretation. Thanks again for the help.

The reason I asked about working through an example is the phrase "combination of variables". That could mean a new variable that combines the variables per respondent such as mean of the 3, or concatenated values of the three or something else.

Don't be afraid to subset and rearrange data to do analysis. I think you are looking for something like this in small:

DATA HAVE;
INPUT q1 - q4;
datalines;
1 2 3 3
1 1 4 4
2 4 2 5
;
run; 

data need; 
	set have;
	array a q1-q3;
	length name $ 10;
	do i=1 to dim(a);
		name=vname(a[i]);
		value=a[i];
		output;
	end;
	keep name value;
    label value='Vars Q1, Q2, Q2';
run;

proc freq data=need;
   tables value;
run;

The array definition will subset the data to the variables of interest.

Or you could transpose of the questionnumeric variables this way. Then use Proc Freq with a
Where upcase(name) in ('Q1' 'Q3' 'Q5');

To get combined frequencies of select names;

Or create format for the value of the name variable to do multiple groups at once:

data need; 
	set have ;
	array a q1-q4;
	length name $ 10;
	do i=1 to dim(a);
		name=vname(a[i]);
		value=a[i];
		output;
	end;
	keep name value;
run;

proc format library=work;
	value $vargrp
	'q1','q2' = 'Q1 Q2'
	'q3','q4' = 'Q3 Q4'
	;

run;

proc tabulate data=need;
	class name;
	class value;
	format name $vargrp.;
	table value,
	      name*(n pctn)
		  /misstext=' '
    ;

run;

View solution in original post

ballardw · Posted 03-20-2020 06:57 PM

I suggest that you show what you want as a result given that data. That is small enough that you should be able to do that by hand. Otherwise we're going to spend a lot of time guessing what combinations you want to consider and what the result should be.

Here's a hint if you think will do similar projects frequently. Do not use text codes for limited choices but create/ read fixed responses, such as from the apparent list you use into a numeric value with a known order of results.

One reason is that then questions about multiple variables having the same value are easy because you can use the RANGE function. A result of 0 would tell you that all variables used for arguments have the same value; easily find the maximum or minimum of a group of variables; get a mean of responses per respondent of multiple variables and more goodies.

You may want to research Likert Scale for more examples details

Here is an example of using a custom informat to read text values into numeric and assigning a custom format to display the original text.

proc format library=work;
invalue agree
"Strongly Disagree" = 1
"Disagree"          = 2
"Neutral"           = 3
"Agree"             = 4
"Strongly Agree"    = 5
" "                 = . 
other               = _error_
;
value agree
1="Strongly Disagree"
2="Disagree"         
3="Neutral"          
4="Agree"            
5="Strongly Agree"  
;
run;

data example;
	infile datalines dlm='|' missover;
	informat q1-q2 agree.;
	format   q1-q2 agree.;
    input q1 q2 ;

datalines;
Strongly Disagree|Strongly Agree   
Disagree         |Strongly Disagree
Neutral          |Disagree         
Agree            |Neutral          
Strongly Agree   |Agree            
Strongly Agree   |Strongly Agree 
Agree            |Wrong Answer 
;

This would be done using an INFILE statement pointing to a TEXT file of some sort and indicate the appropriate value separator as the DLM option.

You can also use an INPUT statement with the text valued variable to create a numeric version:

q1num = input(q1,agree.);

Another possibility depending on the software used to create the survey you may be able to export order code values instead of text.

sasyi · Posted 03-20-2020 08:08 PM

Hi,

Thank you for the suggestion. I failed to mention that the text code were just label and the values are actually coded numerically from the scale 1, 2, 3, 4, 5 and "." for missing. The example table I provided in my original post is just a simple example of what my actual dataset looks like hence why it is easy to do by hand. My actual dataset has more variables (80 variables) and observation (over 800).

I'm not really to sure what a result for this would look like on SAS, but I would imagine something similar to a Proc Freq table, where instead of being a table for just one variable (Q1), it's a combination of three variable or more (Q1, Q3, Q5).

(Q1, Q3, Q5) Frequency Percent CumulativeFrequency CumulativePercentFrequency Missing = Strongly AgreeAgreeNeutralDisagreeStrongly Disagree

Q1,Q2,Q3	Freq	Percent
Strongly Agree
Agree
Neutral
Disagree
Strongly Disagree

I hope this make sense and I apologize for any misinterpretation. Thanks again for the help.

Reeza · Posted 03-20-2020 11:42 PM

I’d start by first making my data long rather than wide and adding in my formats for both questions and the values.
Then you can also combine questions together using formats and proc freq.

q1, q2, q3 all map to the same group so when you do the percentages it’s what you want.

ballardw · Posted 03-21-2020 04:36 AM

@sasyi wrote:

Hi,

Thank you for the suggestion. I failed to mention that the text code were just label and the values are actually coded numerically from the scale 1, 2, 3, 4, 5 and "." for missing. The example table I provided in my original post is just a simple example of what my actual dataset looks like hence why it is easy to do by hand. My actual dataset has more variables (80 variables) and observation (over 800).

I'm not really to sure what a result for this would look like on SAS, but I would imagine something similar to a Proc Freq table, where instead of being a table for just one variable (Q1), it's a combination of three variable or more (Q1, Q3, Q5).

(Q1, Q3, Q5) Frequency Percent CumulativeFrequency CumulativePercentFrequency Missing = Strongly AgreeAgreeNeutralDisagreeStrongly Disagree

Q1,Q2,Q3 Freq Percent

Strongly Agree

Agree

Neutral

Disagree

Strongly Disagree

I hope this make sense and I apologize for any misinterpretation. Thanks again for the help.

The reason I asked about working through an example is the phrase "combination of variables". That could mean a new variable that combines the variables per respondent such as mean of the 3, or concatenated values of the three or something else.

Don't be afraid to subset and rearrange data to do analysis. I think you are looking for something like this in small:

DATA HAVE;
INPUT q1 - q4;
datalines;
1 2 3 3
1 1 4 4
2 4 2 5
;
run; 

data need; 
	set have;
	array a q1-q3;
	length name $ 10;
	do i=1 to dim(a);
		name=vname(a[i]);
		value=a[i];
		output;
	end;
	keep name value;
    label value='Vars Q1, Q2, Q2';
run;

proc freq data=need;
   tables value;
run;

The array definition will subset the data to the variables of interest.

Or you could transpose of the questionnumeric variables this way. Then use Proc Freq with a
Where upcase(name) in ('Q1' 'Q3' 'Q5');

To get combined frequencies of select names;

Or create format for the value of the name variable to do multiple groups at once:

data need; 
	set have ;
	array a q1-q4;
	length name $ 10;
	do i=1 to dim(a);
		name=vname(a[i]);
		value=a[i];
		output;
	end;
	keep name value;
run;

proc format library=work;
	value $vargrp
	'q1','q2' = 'Q1 Q2'
	'q3','q4' = 'Q3 Q4'
	;

run;

proc tabulate data=need;
	class name;
	class value;
	format name $vargrp.;
	table value,
	      name*(n pctn)
		  /misstext=' '
    ;

run;

Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Re: Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Re: Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Re: Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Re: Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Re: Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Re: Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Re: Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Re: Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Re: Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Re: Averaging Discrete Data Frequency and Percentage for Multiple Variables on a Survey

Registration is open

SAS Training: Just a Click Away