BookmarkSubscribeRSS Feed
Emma_at_SAS
Lapis Lazuli | Level 10

I have two independent samples from two cities. I want to compare the average money they spend on coffees each month. I am using PROC SURVEYMEANS but I am not sure if I am using the right statement to identify cities. 

In this document: http://www.math.wpi.edu/saspdf/stat/chap61.pdf   It says to use stratum to compare two independent samples but I tried it and it did not give me a reasonable result. 

 

My results from using DOMAIN and DIFF look fine when I compare the p-value for the t-statistic and the CI for each mean. However, my concern is if I was comparing the average money spent on coffee for men and women in one sample (instead of two independent samples that I have in my study) I still would use the same DOMAIN and DIFF statements. Does that make sense?

Thanks!

 

proc surveymeans data=&dataset  nobs mean stderr var cv clm min max median plots=none;
var coffee_spent ;
domain cities/diff;
weight &weight_var;
run;
The SURVEYMEANS Procedure
Statistics for WAVE Domains
cities Variable N Minimum Maximum Mean Std Error
of Mean
Var of
Mean
95% CL for Mean Coeff of
Variation
2 coffee_spent 1368 1.4 50.0 12.42 0.3 0.091658 11.83 13.01 0.024
3 coffee_spent 1243 1.7 57.5 13.42 0.4 0.124780 12.72 14.11 0.026


Differences of HERB_US_PP Means for WAVE Domains
cities -cities Diff
Estimate
Std
Error
DF t Value Pr > |t|
2 3 -0.996823 0.465238 2610 -2.14 0.0322
15 REPLIES 15
Emma_at_SAS
Lapis Lazuli | Level 10

I have two independent samples from two cities. I want to compare the average money they spend on coffees each month. I am using PROC SURVEYMEANS but I am not sure if I am using the right statement to identify cities. 

In this document: http://www.math.wpi.edu/saspdf/stat/chap61.pdf   It says to use stratum to compare two independent samples but I tried it and it did not give me a reasonable result. Thanks!

 

proc surveymeans data=&dataset  nobs mean stderr var cv clm min max median plots=none;
var coffee_spent ;
domain cities/diff;
weight &weight_var;
run;
The SURVEYMEANS Procedure
Statistics for WAVE Domains
cities Variable N Minimum Maximum Mean Std Error
of Mean
Var of
Mean
95% CL for Mean Coeff of
Variation
2 coffee_spent 1368 1.4 50.0 12.42 0.3 0.091658 11.83 13.01 0.024
3 coffee_spent 1243 1.7 57.5 13.42 0.4 0.124780 12.72 14.11 0.026


Differences of HERB_US_PP Means for WAVE Domains
cities -cities Diff
Estimate
Std
Error
DF t Value Pr > |t|
2 3 -0.996823 0.465238 2610 -2.14 0.0322
sbxkoenk
SAS Super FREQ

This should help:

Usage Note 34607: How can I compare means in PROC SURVEYMEANS?

https://support.sas.com/kb/34/607.html

 

I know a lot about comparing (group-)means but have never used PROC SURVEYMEANS. 

But of course, you can post your follow-up questions, if any; I will try to answer.

 

Koen

sbxkoenk
SAS Super FREQ

Or just look in the documentation:

 

SAS/STAT® 15.2 User's Guide

The SURVEYMEANS Procedure

Example 120.6 Comparing Domain Means

https://go.documentation.sas.com/doc/en/statug/15.2/statug_surveymeans_examples06.htm

 

Have a nice weekend,

Koen

 

Emma_at_SAS
Lapis Lazuli | Level 10
Thank you, Koen, for your suggestions. You also have a nice weekend!
Emma_at_SAS
Lapis Lazuli | Level 10

Thank you, sbxkoenk!

This page that you shared is the equivalent of the DOMAIN and DIFF that I used in SAS 9.4. "in SAS 9.4 TS1M4, you can use the DIFF option in the DOMAIN statement to compare the means of continuous variables"

 

My results from using DOMAIN and DIFF look fine when I compare the p-value for the t-statistic and the CI for each mean. However, my concern is if I was comparing the average money spent on coffee for men and women in one sample (instead of two independent samples that I have in my study) I still would use the same DOMAIN and DIFF statements. Does that make sense?

Thanks!

sbxkoenk
SAS Super FREQ

Hello,

I understand your concern.

Will have a look at it tomorrow.

Will now shut down my PC (it's about 23h.00 in Belgium).

Maybe the question will get solved meanwhile. I will notice tomorrow.

By the way: there's a special board for Statistical Procedures under Analytics. Next time, try that one. Some statisticians don't look into the programming board.

For data step programming though, the programming board is fabulous (high quality responses at a fast pace).

Good luck,

Koen

Emma_at_SAS
Lapis Lazuli | Level 10

Hi Koen, Thank you for your advice and have a great weekend! 

 

 

sbxkoenk
SAS Super FREQ

Just one more remark before I shut down.

Are you really dealing with survey data?

If not, there is a wealth of other procedures in SAS/STAT your question could be answered with.

There's a recent video in the 'Ask the Expert' board that may be of interest:

What Are Best Practices for Using SAS® Survey Procedures? Q&A, Slides, and On-Demand Recording

https://communities.sas.com/t5/Ask-the-Expert/What-Are-Best-Practices-for-Using-SAS-Survey-Procedure...

I suppose the webinar also explains when to use proc surveymeans, instead of just means, ttest, glm, mixed, anova, ...

Koen

Emma_at_SAS
Lapis Lazuli | Level 10

Thank you, Koen, for your thoughts and for sharing the webinar. My dataset is based on a real survey. I am not sure if I really need to use SURVEYMEANS though! I think only with this procedure I can account for the complex survey design I have.

 I will check this webinar and that may answer all my questions. Thanks! 

gcjfernandez
SAS Employee

Assuming your data came from a probability survey design, your objectives is to compute finite population stats and make inferences about finite population and you know the two cities population total. then you can try the following code for a scarified random sample design:

proc surveymeans data=&dataset N=(city1total city2total) nomcar nobs mean stderr var cv clm min max median plots=none;
var coffee_spent ;
domain cities*gender;
domain cities('city1')*gender;
domain cities*gender/diff bon;
strata cities;
weight &weight_var;
run;

Providing city1 and city2 totals will apply the finite population adjustment.

Emma_at_SAS
Lapis Lazuli | Level 10

Thank you,  gcjfernandez.

I think your code answers my question. You incorporated a comparison between the two levels of cities and the two levels of gender in your code. To help me understand it better, may please write the code for comparing only cities?

Thanks

gcjfernandez
SAS Employee

Please see the requested SURVEYMEANS code below:

proc surveymeans data=&dataset N=(city1total city2total) nomcar nobs mean stderr var cv clm min max median ;
var coffee_spent ;
/*to get finite pop estimates by all levels of subpopulation cities */
domain cities;
/*to get finite pop estimates by selected level of subpopulation cities=city1 */
domain cities('city1');
/*to get all pairwise comparison with Bonferroni adjustment between all levels of subpopulation city */
domain cities/diff bon;
strata cities;
weight &weight_var;
run;
Emma_at_SAS
Lapis Lazuli | Level 10

Thank you, gcjfernandez, for the code and your notes.

How does it work to have CITIES as both DOMAIN and STRATA?

Thanks

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 1319 views
  • 5 likes
  • 3 in conversation