BookmarkSubscribeRSS Feed
ANKH1
Pyrite | Level 9

Hello, 

I would like to calculate the mean of multiple variables (VAR1, VAR2, VAR3), not by column or row, but all the values. For example, I would like to get the mean of 34, 45, 31, 35, 41 = 186/5 = 37.2

This is an example of the data
ID  VAR1 VAR2 VAR3
ID1  34   45   0
ID2  31   0    35
ID3  0     0    41

 

I only have the code for a single variable. 

PROC SURVEYMEANS DATA=sample1 MEAN SUM VARMETHOD=BRR;
DOMAIN DHHDDRI;
WHERE VAR1 > 0;
VAR VAR1;
WEIGHT WTS_P;
REPWEIGHTS BSW1-BSW500;
RUN;

 

Thanks in advance!

4 REPLIES 4
ballardw
Super User

Subsetting data , as in WHERE or BY groups is not often a good idea.

If you want to exclude a value from counting or summing for use in a Mean then it should be set to missing, not zero.

 

What is the actual relationship between ID1, ID2 and ID3 that these should be treated together in the manner you suggest?

 

This may not be a simple, i.e. one step, solution because of the way that the Survey procs treat the weights.

 

 

ANKH1
Pyrite | Level 9

Thanks!

How do I set to missing values?

"What is the actual relationship between ID1, ID2 and ID3 that these should be treated together in the manner you suggest?They are all types of fruits (g of fruits), but I want to know the mean for all fruits, not individual types of fruits. 

"This may not be a simple, i.e. one step, solution because of the way that the Survey procs treat the weights." I agree, that is why I wanted to get experts' input.

SteveDenham
Jade | Level 19

What you might consider is turning your 'wide' dataset into a 'long' dataset and following @ballardw 's advice and set 0's to missing.  I am sure that other variables get carried along (like the sampling weights), but here is what I would do:

 

data have_wide;
input ID $  VAR1 VAR2 VAR3;
datalines;:
ID1  34   45   0
ID2  31   0    35
ID3  0     0    41
;

data want_long;
set have_wide;
value=var1; varlevel ='VAR1'; output;
value=var2; varlevel ='VAR2'; output;
value=var3; varlevel ='VAR3'; output;
keep id varlevel value;
run;

data want_long2;
set want_long;
if value=0 then value=.;
run;

Now, you can calculate overall means by ignoring varlevel, or varlevel means by including a CLASS statement:

 

/*Over all levels of varlevel */

PROC SURVEYMEANS DATA=want_long2 MEAN SUM VARMETHOD=BRR;
DOMAIN DHHDDRI;
VAR VALUE;
WEIGHT WTS_P;
REPWEIGHTS BSW1-BSW500;
RUN;

/* Means for each varlevel */
PROC SURVEYMEANS DATA=want_long2 MEAN SUM VARMETHOD=BRR;
CLASS VARLEVEL;
DOMAIN DHHDDRI;
VAR VALUE;
WEIGHT WTS_P;
REPWEIGHTS BSW1-BSW500;
RUN;

I assume that DOMAIN, WEIGHT and REPWEIGHTS are fixed for any given ID.  If that is the case, then the conversion of wide to long shouldn't be too hard.  However, I have some fear about REPWEIGHTS.  Anything with 500 observations needs to be handled carefully.  A more complete example of your wide data would clear part of this up.

 

SteveDenham

 

ANKH1
Pyrite | Level 9

Thank you so much! Your code worked perfectly. I added on the keep statement:

data want_long;
set sample1;
value=var1; varlevel='var1';output;
value=var2; varlevel='var2';output;
value=var3; varlevel='var3';output;
keep id DHHDDRI WTS_P BSW1-BSW500 varlevel value;
run;

 

I used the below option you gave since we want to know the overall mean of values by ignoring specific varlevel

 

proc surveymeans data=want_long2 mean sum varmethod=BRR;
domain DHHDDRI;
var value;
weight WTS_P;
repweights BSW1-BSW500;
run;

 

Yes, the domain, weight and repweights are fixed for ID. The conversion worked perfectly. This is a snapshot of our dataframe. I import data from excel.

 

Capture1.PNG

 

I think your code is what we needed. What do you think?

Thanks again for your help!

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1224 views
  • 1 like
  • 3 in conversation