BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Hello All,

This is a bit of continuation of a question, I previously answered, so please bear with me.

1) Here is my example dataset:

data a;
input Fruit $ Ques1 Quest2;
cards;
Apple 1 4
Banana 1 5
Banana 3 4
Apple 2 4
Orange 5 1
Orange 2 4
Orange 3 3
;
run;

2) I would like to find the average for all responses responses to Ques1 & Quest2 (combined). Think of this as the average response for all all the questions of all fruits combined.

3) I would like to find the average for all responses where Fruit='Apple' or Fruit='Banana' etc. Think of this as where I want the average question response to all questions for just bananas, then just apples, etc.

4. How could I limit this to just particular variables? For instance if I wanted the average response to only Ques1 for *just* bananas?

Thanks to all of you again for your time.

_jeff_
1 ACCEPTED SOLUTION

Accepted Solutions
deleted_user
Not applicable

Editor's Note: This solution, along with the addition of the following in a later post:

Change the statement to avg(ques1+quest2) as avg to avg(ques+quest2)/2 as avg in select statement in 2 and 3...

provide answers to all questions asked. Thanks also to @SteveDenham for providing a PROC MEANS alternative.

 

You can also use Proc Sql with Group by and Having clauses, if you are good with Proc Sql;

Below is the code for the your questions 2,3 4 in the order:

proc sql;
create table b as
select fruit, ques1, quest2, avg(ques1+quest2) as avg
from a;
quit;

proc sql;
create table c as
select fruit, ques1 , quest2, avg(ques1+quest2) as avg1
from a
group by fruit
having fruit in ('Banana', 'Apple');
quit;

proc sql;
create table d as
select fruit, ques1 , quest2, avg(ques1) as avg2
from a
group by fruit
having fruit in ('Banana');
quit;

~ Sukanya E

View solution in original post

13 REPLIES 13
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Have a look at using PROC MEANS with NOPRINT, a CLASS statement, and an OUTPUT statement, and review the generated results with focus on _TYPE_ variable.

Scott Barry
SBBWorks, Inc.
sassygrl
Calcite | Level 5
Hi Jeff, have you tried Proc Means? Here are examples, in order of your questions.

(2)

Proc Means data=a;
output out=means mean=;
Run; Quit;

(3)

Proc Sort data=a; by fruit; run;
Proc Means data=a;
by fruit;
output out=means mean=;
Run; Quit;

(4)

Proc Means data=a;
var ques1 quest2;
where fruit='Banana';
output out=means mean=;
Run; Quit;

Happy SASsing! 🙂 Tricia (Added Proc Sort to #3.)


Message was edited by: sassygrl
deleted_user
Not applicable
sasygirl -

Thank you for your time and response. I like the examples which you have provided, the only downside being that each of your solutions keeps the variables separate.

For instance, for #2, I want the mean for all responses to both Ques1 and Quest2 (combined) . Not as separate values. When running your solution to #2, I get the following:

Ques1 7 2.4285714
Quest2 7 3.5714286

What I want is one number/mean which represents the mean for all responses to both Quest1 and Quest2.

My 'real world' dataset is survey, where the first 5 questions represent the employees responses to 'Job related' questions... and I'm trying to create one number which is "Overall Job". This number is the average response from all users to all questions. This number will be used as a baseline to compare individual divisions against (hence #3 in my original post).

Thanks!
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Using a DATA step, generate additional observations masking your BY variable as value "*ALL*" or something.

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable

Editor's Note: This solution, along with the addition of the following in a later post:

Change the statement to avg(ques1+quest2) as avg to avg(ques+quest2)/2 as avg in select statement in 2 and 3...

provide answers to all questions asked. Thanks also to @SteveDenham for providing a PROC MEANS alternative.

 

You can also use Proc Sql with Group by and Having clauses, if you are good with Proc Sql;

Below is the code for the your questions 2,3 4 in the order:

proc sql;
create table b as
select fruit, ques1, quest2, avg(ques1+quest2) as avg
from a;
quit;

proc sql;
create table c as
select fruit, ques1 , quest2, avg(ques1+quest2) as avg1
from a
group by fruit
having fruit in ('Banana', 'Apple');
quit;

proc sql;
create table d as
select fruit, ques1 , quest2, avg(ques1) as avg2
from a
group by fruit
having fruit in ('Banana');
quit;

~ Sukanya E

deleted_user
Not applicable
Sukanya,

Thank you for your time as well, however, your response to #2 creates incorrect results.

proc sql;
create table b as
select fruit, ques1, quest2, avg(ques1+quest2) as avg
from a;
quit;

It shows the 'avg' as being '6', which can't be correct given that my survey response values only range from 1-4.

Just wanted to share what I noticed.

thanks!
deleted_user
Not applicable
Oh..You said you wanted the average of sum of ques1 and quest2. Average of both the responses give the result 6 as per the data. Is not that you wanted ?? Am I misisng some thing ?? How about the other results 3, 4 ??

~ Sukanya E
deleted_user
Not applicable
I apologize if I am not explaining my need very well. Let me try an example

If my dataset looked like this:

Apple 1 2
Banana 3 4
Orange 2 3

I want the 'average' of all responses. So, in this dataset I have a total of 6 responses that have been recorded. So the 'math' would look like this for this small dataset:

1 + 2 + 3 + 4 + 2 + 3 = 15
15 / 6 = 2.5

So, my overall average is 2.5

Does that help clarify? Again, my apologies
deleted_user
Not applicable
Oh...I got it...Just divide by 2....Change the statement to avg(ques1+quest2) as avg to avg(ques+quest2)/2 as avg in select statement in 2 and 3...

I hope this works for you now...


~ Sukanya E
deleted_user
Not applicable
Perfect! Thank you for your time AND patience! I haven't verified #4 yet, but by dividing by 2, all seems to work perfectly for #2 and #3.
deleted_user
Not applicable
For #4, you don't have to divide by 2 as you are considering only for Ques1...
Hope this makes sense..

~ Sukanya E
deleted_user
Not applicable
If you want the separate means for ques1 and quest2, then change the statement avg(ques1+quest2) as avg to avg(ques1) as qvg1, avg(quest2) as qvg2 in select statement.

~ Sukanya E
SteveDenham
Jade | Level 19
Hello Jeff,

Try the following code, after what you have:

data long;
set a;
ques=1;response=ques1;output;
ques=2;response=quest2;output;
drop ques1 quest2;
run;

proc means data=long noprint;
class fruit ques;
var response;
output out=manymeans mean=mean;
run;

Now take a look at the dataset manymeans. In it is a variable called _TYPE_. By looking at the appropriate value of _TYPE_, you will get exactly the kinds of averages you were talking about.

_TYPE_=0 gives the grand mean across all questions and fruit types. (your point #2)
_TYPE_=1 gives the mean of each question (1 and 2), across all fruit types.
_TYPE_=2 gives the mean of each fruit type, across both questions. (your point #3)
_TYPE_=3 gives the mean of each question for each fruit type. (your point #4)

Good luck with this.

Steve Denham

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 17948 views
  • 0 likes
  • 4 in conversation