Hi,
I am looking to compare the severity of deer damage between two sites, but I am confised as to how to go about it. Basically the data looks like this:
Site | Severity |
Non-intervention | 0 |
Non-intervention | 0 |
Non-intervention | 1 |
Non-intervention | 2 |
Non-intervention | 0 |
Non-intervention | 2 |
Non-intervention | 1 |
Non-intervention | 1 |
Non-intervention | 2 |
Non-intervention | 2 |
Non-intervention | 0 |
Non-intervention | 1 |
Non-intervention | 2 |
Non-intervention | 0 |
Planted | 0 |
Planted | 1 |
Planted | 2 |
Planted | 2 |
Planted | 0 |
Planted | 1 |
Planted | 2 |
I am looking to compare if the scores of zero are different between the two sites (p-value for 0, 1 and 2). The same goes for the score of 1 and 2 (individually). The number of observations in the non-intervention compared to the planted are very different (900 vs. 240), so I assume it is the frequency I am looking to compare.
I have tried writing a chi-square test in this manor:
proc freq data=Thesis;
tables severity*site / chisq nocol norow nopercent;
weight severity; where severity='1'; run;
However, when I do this all it gives me is this:
Which is just the number of observations that equals 1 in each site.
I also tried doing this:
proc glm DATA = Thesis;
Class site;
Model severity = site; by severity; RUN;
That gives me this, which is not useful at all:
Any suggestions?
Best regards
First, there is a statistical problem here, in that these comparisons are not independent of one another, and so there are really NOT three different independent comparisons.
If you perform the overall chi-squared test for your 2x3 table, this will tell you whether or not there is any pattern in the data other than independence of the percents in the table.
proc freq data=have;
tables site*severity/chisq;
run;
If this chi-squared test shows to be not statistically significant, I would stop there and say no difference anywhere. If the Chi-square is statistically significant, then you can certainly do the tests in your table (although I can't think of a quick way to do all three).
You can compare the first row (54.48 to 68.33) using this code and then you'd have to repeat and modify the code to do all three.
proc format;
value sev 0='0' 1,2='Other';
run;
proc freq data=have;
tables site*severity/chisq;
format severity sev.;
run;
Again, I point out the doing all three of these tests isn't really statistically valid as the tests are not independent of one another.
I am looking to compare if the scores of zero are different between the two sites (p-value for 0, 1 and 2). ... The number of observations in the non-intervention compared to the planted are very different (900 vs. 240), so I assume it is the frequency I am looking to compare.
I assume you mean: are the percents of zero scores different between the two sites ...
If so, I'm unclear on another issue. If you limit the data using
where severity='1';
as in your code, then you are comparing the 4 times 1 appears next to 'Non-intervention' to the two times 1 appears next to 'Planted', so that's 4 out of 6, which is 66.7% for 'Non-Intervention'. Is that what you want? And then you want to test the 66.7% against the null hypothesis of 50%? Is that what you want?
If no, can you please describe in more detail (using words and math, not in terms of SAS) what test you are trying to do?
Yes, I do mean percentages. Because the difference in the size of datasets will give me a significant difference between the frequancy of sites. The number of observations in the non-intervention is approx. 900 while it is 250 in the planted. So I will, as an example, have 300 observations of severity 0 in the non-intervention, but only 50 in the planted, though the percentages are not far apart.
Is is possible to do what @PaigeMiller did (because that works), only with percentages?
Best regards
So what do you want to test? You didn't answer that question.
Do you want to test 300/900 compared to 50/250?
Or do you want to test 300/350 compared to 50/350?
Non-intervention | Planted | |
0 | 54.48 | 68.33 |
1 | 26.12 | 16.25 |
2 | 19.3 | 15.42 |
I want to compare these numbers to test if they are significantly different, meaning 1) is 54.48 % significantly different from 68.33 % 2) is 26.12 % significantly different from 16.25 % 3) is 19.3 % significantly different from 15.42 %.
I does not make sense to compare the frequencies that created these percentages since the non-intervention have 900 observations and the Planted have 250 observations. I did that and it says everything is significantly different (<0.0001).
What I want to do may not be possible, though. I am aware of that.
Maja
First, there is a statistical problem here, in that these comparisons are not independent of one another, and so there are really NOT three different independent comparisons.
If you perform the overall chi-squared test for your 2x3 table, this will tell you whether or not there is any pattern in the data other than independence of the percents in the table.
proc freq data=have;
tables site*severity/chisq;
run;
If this chi-squared test shows to be not statistically significant, I would stop there and say no difference anywhere. If the Chi-square is statistically significant, then you can certainly do the tests in your table (although I can't think of a quick way to do all three).
You can compare the first row (54.48 to 68.33) using this code and then you'd have to repeat and modify the code to do all three.
proc format;
value sev 0='0' 1,2='Other';
run;
proc freq data=have;
tables site*severity/chisq;
format severity sev.;
run;
Again, I point out the doing all three of these tests isn't really statistically valid as the tests are not independent of one another.
If I understand your question, you are trying to do a one-way analysis on the frequency of observations for each of the levels Severity=0, 1, and 2. Try this code:
data Have;
input Site $ Severity;
datalines;
Non-intervention 0
Non-intervention 0
Non-intervention 1
Non-intervention 2
Non-intervention 0
Non-intervention 2
Non-intervention 1
Non-intervention 1
Non-intervention 2
Non-intervention 2
Non-intervention 0
Non-intervention 1
Non-intervention 2
Non-intervention 0
Planted 0
Planted 1
Planted 2
Planted 2
Planted 0
Planted 1
Planted 2
;
proc sort data=Have; by Severity; run;
proc freq data=Have;
by Severity;
tables Site / chisq plots=none;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.