Programming the statistical procedures from SAS

comparing the frequencies of 2 groups for several levels

Accepted Solution Solved
Reply
Super Contributor
Posts: 412
Accepted Solution

comparing the frequencies of 2 groups for several levels

[ Edited ]

Hi,

 

 

I have data for 6 different groups and for each group I have the percentage of Males and Females in that group.

This is the data:

 

 

F

M

1

0.103967

0.896489

2

0.070575

0.929425

3

0.081081

0.918919

4

0.08221

0.91779

5

0.056566

0.945455

6

0.083333

0.916667

 

What I would like to know is wether there is a statisticlly significant diffenrece of the percenatge of malesa and females between the groups and how could SAS find that out. 

Thank you! 


Accepted Solutions
Solution
‎05-15-2016 12:24 PM
Respected Advisor
Posts: 4,606

Re: comparing the frequencies of 2 groups for several levels

Assuming you have frequencies, you would do:

 

data have;
input g nF nM;
datalines;
 1    76    655
 2    65    856
 3     3     34
 4    61    681
 5    28    467
 6     1     11
;

data long;
set have;
sex = "F"; freq = nF; output;
sex = "M"; freq = nM; output;
keep g sex freq;
run;

proc freq data=long;
tables g*sex / nopercent nocol;
exact fisher / mc;
weight freq;
run;
PG

View solution in original post


All Replies
Grand Advisor
Posts: 16,926

Re: comparing the frequencies of 2 groups for several levels

You can look into the Chi square test - available under proc freq. 

There are also other tests available under Proc Freq that may be useful. Check the documentation. 

 

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_sec...

 

Respected Advisor
Posts: 4,606

Re: comparing the frequencies of 2 groups for several levels

You need the frequencies (numbers of males and females) to test for differences between the proportions.

PG
Grand Advisor
Posts: 9,466

Re: comparing the frequencies of 2 groups for several levels

Since there are only two levels, the simplest way is using proc ttest(parameter method for normal data),proc npar1way (non-parameter method).

data have;
input g F M;
cards;
1 0.103967 0.896489
2 0.070575 0.929425
3 0.081081 0.918919
4 0.08221 0.91779
5 0.056566 0.945455
6 0.083333 0.916667
;
run;
proc transpose data=have out=want name=Gender;
by g;
var f m;
run;
proc ttest data=want cochran ci=equal umpu;
class Gender;
var col1;
run;


/***************/
proc npar1way wilcoxon correct=no data=want;
class Gender;
var col1;
exact wilcoxon;
run;




Two ways show the significant . there is difference between F and M .


Method	Variances	DF	t Value	Pr > |t|
Pooled	Equal	10	-91.57	<.0001
Satterthwaite	Unequal	9.9914	-91.57	<.0001
Cochran	Unequal	5	-91.57	<.0001

Kruskal-Wallis Test
Chi-Square	8.3077
DF	1
Pr > Chi-Square	0.0039




Grand Advisor
Posts: 16,926

Re: comparing the frequencies of 2 groups for several levels

I don't think that's statistically valid - it's a categorical variable, ttest is for continuous variables. 

 


Ksharp wrote:
Since there are only two levels, the simplest way is using proc ttest(parameter method for normal data),proc npar1way (non-parameter method).
data have;
input g F M;
cards;
1 0.103967 0.896489
2 0.070575 0.929425
3 0.081081 0.918919
4 0.08221 0.91779
5 0.056566 0.945455
6 0.083333 0.916667
;
run;
proc transpose data=have out=want name=Gender;
by g;
var f m;
run;
proc ttest data=want cochran ci=equal umpu;
class Gender;
var col1;
run;


/***************/
proc npar1way wilcoxon correct=no data=want;
class Gender;
var col1;
exact wilcoxon;
run;




Two ways show the significant . there is difference between F and M .


Method	Variances	DF	t Value	Pr > |t|
Pooled	Equal	10	-91.57	<.0001
Satterthwaite	Unequal	9.9914	-91.57	<.0001
Cochran	Unequal	5	-91.57	<.0001

Kruskal-Wallis Test
Chi-Square	8.3077
DF	1
Pr > Chi-Square	0.0039





 

Super Contributor
Posts: 412

Re: comparing the frequencies of 2 groups for several levels

Hi Xia,

 

actually in my data it is not 6 separate pairs of male and female but 6 different categories and each category has the given proportions of males and females. Unless I am mistaken (my knowledge of statistics is intermediate) the t test would test for the significance in the means of the 2 groups, therefore making a male mean of the 6 observations and a female mean of the 6 observations and comparing them. But what I would like to get is to know whether the real proportion of males and females is the same in all 6 categories, or if there are categories where the real proportions of makes and females are actually different.

 

Thanks!

Grand Advisor
Posts: 9,466

Re: comparing the frequencies of 2 groups for several levels

Then follows what PG said. I think your data is too few and sparse, you can't said whether it is different between two cells(value).
Super Contributor
Posts: 412

Re: comparing the frequencies of 2 groups for several levels

Hi Xia,

 

actually I have over 4000 observations with 2 variables: 1 for the category (1 to 6) and one for gender (0 and 1). I am looking into this "Chi square test", do you think its appropriate?

 

Thanks!

Grand Advisor
Posts: 16,926

Re: comparing the frequencies of 2 groups for several levels

Read up on it. You should be able to defend your decision to go with x method. 

 

https://en.m.wikipedia.org/wiki/Chi-squared_test

 

The chi-squared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. Does the number of individuals or objects that fall in each category differ significantly from the number you would expect? Is this difference between the expected and observed due to sampling variation, or is it a real difference?

 

 

 

What Statistical should I use?

http://www.ats.ucla.edu/stat/sas/whatstat/default.htm

Grand Advisor
Posts: 9,466

Re: comparing the frequencies of 2 groups for several levels

Yeah. I think so. So your data actually is Contingency table ?
Solution
‎05-15-2016 12:24 PM
Respected Advisor
Posts: 4,606

Re: comparing the frequencies of 2 groups for several levels

Assuming you have frequencies, you would do:

 

data have;
input g nF nM;
datalines;
 1    76    655
 2    65    856
 3     3     34
 4    61    681
 5    28    467
 6     1     11
;

data long;
set have;
sex = "F"; freq = nF; output;
sex = "M"; freq = nM; output;
keep g sex freq;
run;

proc freq data=long;
tables g*sex / nopercent nocol;
exact fisher / mc;
weight freq;
run;
PG
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 406 views
  • 7 likes
  • 4 in conversation