Re: Comparing means from two independent samples

Emma_at_SAS · Posted 04-16-2021 05:12 PM

I have two independent samples from two cities. I want to compare the average money they spend on coffees each month. I am using PROC SURVEYMEANS but I am not sure if I am using the right statement to identify cities.

In this document: http://www.math.wpi.edu/saspdf/stat/chap61.pdf It says to use stratum to compare two independent samples but I tried it and it did not give me a reasonable result.

My results from using DOMAIN and DIFF look fine when I compare the p-value for the t-statistic and the CI for each mean. However, my concern is if I was comparing the average money spent on coffee for men and women in one sample (instead of two independent samples that I have in my study) I still would use the same DOMAIN and DIFF statements. Does that make sense?

Thanks!

proc surveymeans data=&dataset  nobs mean stderr var cv clm min max median plots=none;
var coffee_spent ;
domain cities/diff;
weight &weight_var;
run;

The SURVEYMEANS Procedure

Statistics for WAVE Domains
cities	Variable	N	Minimum	Maximum	Mean	Std Error of Mean	Var of Mean	95% CL for Mean		Coeff of Variation
2	coffee_spent	1368	1.4	50.0	12.42	0.3	0.091658	11.83	13.01	0.024
3	coffee_spent	1243	1.7	57.5	13.42	0.4	0.124780	12.72	14.11	0.026

Differences of HERB_US_PP Means for WAVE Domains
cities	-cities	Diff Estimate	Std Error	DF	t Value	Pr > \|t\|
2	3	-0.996823	0.465238	2610	-2.14	0.0322

Emma_at_SAS · Posted 04-16-2021 04:00 PM

I have two independent samples from two cities. I want to compare the average money they spend on coffees each month. I am using PROC SURVEYMEANS but I am not sure if I am using the right statement to identify cities.

In this document: http://www.math.wpi.edu/saspdf/stat/chap61.pdf It says to use stratum to compare two independent samples but I tried it and it did not give me a reasonable result. Thanks!

proc surveymeans data=&dataset  nobs mean stderr var cv clm min max median plots=none;
var coffee_spent ;
domain cities/diff;
weight &weight_var;
run;

The SURVEYMEANS Procedure

Statistics for WAVE Domains
cities	Variable	N	Minimum	Maximum	Mean	Std Error of Mean	Var of Mean	95% CL for Mean		Coeff of Variation
2	coffee_spent	1368	1.4	50.0	12.42	0.3	0.091658	11.83	13.01	0.024
3	coffee_spent	1243	1.7	57.5	13.42	0.4	0.124780	12.72	14.11	0.026

Differences of HERB_US_PP Means for WAVE Domains
cities	-cities	Diff Estimate	Std Error	DF	t Value	Pr > \|t\|
2	3	-0.996823	0.465238	2610	-2.14	0.0322

sbxkoenk · Posted 04-16-2021 04:40 PM

This should help:

Usage Note 34607: How can I compare means in PROC SURVEYMEANS?

https://support.sas.com/kb/34/607.html

I know a lot about comparing (group-)means but have never used PROC SURVEYMEANS.

But of course, you can post your follow-up questions, if any; I will try to answer.

Koen

sbxkoenk · Posted 04-16-2021 04:50 PM

Or just look in the documentation:

SAS/STAT® 15.2 User's Guide

The SURVEYMEANS Procedure

Example 120.6 Comparing Domain Means

https://go.documentation.sas.com/doc/en/statug/15.2/statug_surveymeans_examples06.htm

Have a nice weekend,

Koen

Emma_at_SAS · Posted 04-16-2021 04:52 PM

Thank you, Koen, for your suggestions. You also have a nice weekend!

Emma_at_SAS · Posted 04-16-2021 04:50 PM

Thank you, sbxkoenk!

This page that you shared is the equivalent of the DOMAIN and DIFF that I used in SAS 9.4. "in SAS 9.4 TS1M4, you can use the DIFF option in the DOMAIN statement to compare the means of continuous variables"

My results from using DOMAIN and DIFF look fine when I compare the p-value for the t-statistic and the CI for each mean. However, my concern is if I was comparing the average money spent on coffee for men and women in one sample (instead of two independent samples that I have in my study) I still would use the same DOMAIN and DIFF statements. Does that make sense?

Thanks!

sbxkoenk · Posted 04-16-2021 04:59 PM

Hello,

I understand your concern.

Will have a look at it tomorrow.

Will now shut down my PC (it's about 23h.00 in Belgium).

Maybe the question will get solved meanwhile. I will notice tomorrow.

By the way: there's a special board for Statistical Procedures under Analytics. Next time, try that one. Some statisticians don't look into the programming board.

For data step programming though, the programming board is fabulous (high quality responses at a fast pace).

Good luck,

Koen

Emma_at_SAS · Posted 04-16-2021 05:03 PM

Hi Koen, Thank you for your advice and have a great weekend!

sbxkoenk · Posted 04-16-2021 05:15 PM

Just one more remark before I shut down.

Are you really dealing with survey data?

If not, there is a wealth of other procedures in SAS/STAT your question could be answered with.

There's a recent video in the 'Ask the Expert' board that may be of interest:

What Are Best Practices for Using SAS® Survey Procedures? Q&A, Slides, and On-Demand Recording

https://communities.sas.com/t5/Ask-the-Expert/What-Are-Best-Practices-for-Using-SAS-Survey-Procedure...

I suppose the webinar also explains when to use proc surveymeans, instead of just means, ttest, glm, mixed, anova, ...

Koen

Emma_at_SAS · Posted 04-16-2021 05:19 PM

Thank you, Koen, for your thoughts and for sharing the webinar. My dataset is based on a real survey. I am not sure if I really need to use SURVEYMEANS though! I think only with this procedure I can account for the complex survey design I have.

I will check this webinar and that may answer all my questions. Thanks!

gcjfernandez · Posted 04-18-2021 12:01 AM

Assuming your data came from a probability survey design, your objectives is to compute finite population stats and make inferences about finite population and you know the two cities population total. then you can try the following code for a scarified random sample design:

proc surveymeans data=&dataset N=(city1total city2total) nomcar nobs mean stderr var cv clm min max median plots=none;
var coffee_spent ;
domain cities*gender;
domain cities('city1')*gender;
domain cities*gender/diff bon;
strata cities;
weight &weight_var;
run;

Providing city1 and city2 totals will apply the finite population adjustment.

Emma_at_SAS · Posted 04-19-2021 01:30 PM

Thank you, gcjfernandez.

I think your code answers my question. You incorporated a comparison between the two levels of cities and the two levels of gender in your code. To help me understand it better, may please write the code for comparing only cities?

Thanks

gcjfernandez · Posted 04-20-2021 01:50 AM

Please see the requested SURVEYMEANS code below:

proc surveymeans data=&dataset N=(city1total city2total) nomcar nobs mean stderr var cv clm min max median ;
var coffee_spent ;
/*to get finite pop estimates by all levels of subpopulation cities */
domain cities;
/*to get finite pop estimates by selected level of subpopulation cities=city1 */
domain cities('city1');
/*to get all pairwise comparison with Bonferroni adjustment between all levels of subpopulation city */
domain cities/diff bon;
strata cities;
weight &weight_var;
run;

Emma_at_SAS · Posted 04-20-2021 02:18 AM

Thank you, gcjfernandez, for the code and your notes.

How does it work to have CITIES as both DOMAIN and STRATA?

Thanks

gcjfernandez · Posted 04-20-2021 02:27 AM

Following the SAS documentation you can use the same variable as STRATA and DOMAIN:
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_surveymeans_examples06.htm

Comparing means from two independent samples

proc surveymeans: Comparing means from two independent samples

Re: proc surveymeans: Comparing means from two independent samples

Re: proc surveymeans: Comparing means from two independent samples

Re: proc surveymeans: Comparing means from two independent samples

Re: proc surveymeans: Comparing means from two independent samples

Re: proc surveymeans: Comparing means from two independent samples

Re: proc surveymeans: Comparing means from two independent samples

Re: proc surveymeans: Comparing means from two independent samples

Re: proc surveymeans: Comparing means from two independent samples