I have two independent samples from two cities. I want to compare the average money they spend on coffees each month. I am using PROC SURVEYMEANS but I am not sure if I am using the right statement to identify cities.
In this document: http://www.math.wpi.edu/saspdf/stat/chap61.pdf It says to use stratum to compare two independent samples but I tried it and it did not give me a reasonable result.
My results from using DOMAIN and DIFF look fine when I compare the p-value for the t-statistic and the CI for each mean. However, my concern is if I was comparing the average money spent on coffee for men and women in one sample (instead of two independent samples that I have in my study) I still would use the same DOMAIN and DIFF statements. Does that make sense?
Thanks!
proc surveymeans data=&dataset nobs mean stderr var cv clm min max median plots=none;
var coffee_spent ;
domain cities/diff;
weight &weight_var;
run;
Statistics for WAVE Domains | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
cities | Variable | N | Minimum | Maximum | Mean | Std Error of Mean |
Var of Mean |
95% CL for Mean | Coeff of Variation |
|
2 | coffee_spent | 1368 | 1.4 | 50.0 | 12.42 | 0.3 | 0.091658 | 11.83 | 13.01 | 0.024 |
3 | coffee_spent | 1243 | 1.7 | 57.5 | 13.42 | 0.4 | 0.124780 | 12.72 | 14.11 | 0.026 |
Differences of HERB_US_PP Means for WAVE Domains | ||||||
---|---|---|---|---|---|---|
cities | -cities | Diff Estimate |
Std Error |
DF | t Value | Pr > |t| |
2 | 3 | -0.996823 | 0.465238 | 2610 | -2.14 | 0.0322 |
I have two independent samples from two cities. I want to compare the average money they spend on coffees each month. I am using PROC SURVEYMEANS but I am not sure if I am using the right statement to identify cities.
In this document: http://www.math.wpi.edu/saspdf/stat/chap61.pdf It says to use stratum to compare two independent samples but I tried it and it did not give me a reasonable result. Thanks!
proc surveymeans data=&dataset nobs mean stderr var cv clm min max median plots=none;
var coffee_spent ;
domain cities/diff;
weight &weight_var;
run;
Statistics for WAVE Domains | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
cities | Variable | N | Minimum | Maximum | Mean | Std Error of Mean |
Var of Mean |
95% CL for Mean | Coeff of Variation |
|
2 | coffee_spent | 1368 | 1.4 | 50.0 | 12.42 | 0.3 | 0.091658 | 11.83 | 13.01 | 0.024 |
3 | coffee_spent | 1243 | 1.7 | 57.5 | 13.42 | 0.4 | 0.124780 | 12.72 | 14.11 | 0.026 |
Differences of HERB_US_PP Means for WAVE Domains | ||||||
---|---|---|---|---|---|---|
cities | -cities | Diff Estimate |
Std Error |
DF | t Value | Pr > |t| |
2 | 3 | -0.996823 | 0.465238 | 2610 | -2.14 | 0.0322 |
This should help:
Usage Note 34607: How can I compare means in PROC SURVEYMEANS?
https://support.sas.com/kb/34/607.html
I know a lot about comparing (group-)means but have never used PROC SURVEYMEANS.
But of course, you can post your follow-up questions, if any; I will try to answer.
Koen
Or just look in the documentation:
SAS/STAT® 15.2 User's Guide
The SURVEYMEANS Procedure
Example 120.6 Comparing Domain Means
https://go.documentation.sas.com/doc/en/statug/15.2/statug_surveymeans_examples06.htm
Have a nice weekend,
Koen
Thank you, sbxkoenk!
This page that you shared is the equivalent of the DOMAIN and DIFF that I used in SAS 9.4. "in SAS 9.4 TS1M4, you can use the DIFF option in the DOMAIN statement to compare the means of continuous variables"
My results from using DOMAIN and DIFF look fine when I compare the p-value for the t-statistic and the CI for each mean. However, my concern is if I was comparing the average money spent on coffee for men and women in one sample (instead of two independent samples that I have in my study) I still would use the same DOMAIN and DIFF statements. Does that make sense?
Thanks!
Hello,
I understand your concern.
Will have a look at it tomorrow.
Will now shut down my PC (it's about 23h.00 in Belgium).
Maybe the question will get solved meanwhile. I will notice tomorrow.
By the way: there's a special board for Statistical Procedures under Analytics. Next time, try that one. Some statisticians don't look into the programming board.
For data step programming though, the programming board is fabulous (high quality responses at a fast pace).
Good luck,
Koen
Hi Koen, Thank you for your advice and have a great weekend!
Just one more remark before I shut down.
Are you really dealing with survey data?
If not, there is a wealth of other procedures in SAS/STAT your question could be answered with.
There's a recent video in the 'Ask the Expert' board that may be of interest:
What Are Best Practices for Using SAS® Survey Procedures? Q&A, Slides, and On-Demand Recording
I suppose the webinar also explains when to use proc surveymeans, instead of just means, ttest, glm, mixed, anova, ...
Koen
Thank you, Koen, for your thoughts and for sharing the webinar. My dataset is based on a real survey. I am not sure if I really need to use SURVEYMEANS though! I think only with this procedure I can account for the complex survey design I have.
I will check this webinar and that may answer all my questions. Thanks!
Assuming your data came from a probability survey design, your objectives is to compute finite population stats and make inferences about finite population and you know the two cities population total. then you can try the following code for a scarified random sample design:
proc surveymeans data=&dataset N=(city1total city2total) nomcar nobs mean stderr var cv clm min max median plots=none;
var coffee_spent ;
domain cities*gender;
domain cities('city1')*gender;
domain cities*gender/diff bon;
strata cities;
weight &weight_var;
run;
Providing city1 and city2 totals will apply the finite population adjustment.
Thank you, gcjfernandez.
I think your code answers my question. You incorporated a comparison between the two levels of cities and the two levels of gender in your code. To help me understand it better, may please write the code for comparing only cities?
Thanks
Please see the requested SURVEYMEANS code below:
proc surveymeans data=&dataset N=(city1total city2total) nomcar nobs mean stderr var cv clm min max median ;
var coffee_spent ;
/*to get finite pop estimates by all levels of subpopulation cities */
domain cities;
/*to get finite pop estimates by selected level of subpopulation cities=city1 */
domain cities('city1');
/*to get all pairwise comparison with Bonferroni adjustment between all levels of subpopulation city */
domain cities/diff bon;
strata cities;
weight &weight_var;
run;
Thank you, gcjfernandez, for the code and your notes.
How does it work to have CITIES as both DOMAIN and STRATA?
Thanks
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.