What test do I run if want to assess the stat sig of variables that have multiple permitted values (categorical). this is the output table that I want to assess stat sig between states not within state. So is '1' stat sig diff than ALL*CO*DE*KY*MA*MD*etc? and '2' and '3'?
Thanks,
Pat
State | |||||||||||||||||||||||||||
ALL | CO | DE | KY | MA | MD | NC | OH | TN | |||||||||||||||||||
1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | |
tri_mnt032devplns | 27.5% | 47.3% | 25.2% | 26.6% | 50.2% | 23.2% | 42.9% | 42.6% | 14.4% | 10.7% | 52.8% | 36.4% | 30.8% | 49.8% | 19.5% | 30.6% | 52.8% | 16.6% | 31.5% | 43.3% | 25.3% | 28.2% | 52.4% | 19.4% | 28.0% | 44.6% | 27.5% |
tri_mnt032observed | 25.4% | 66.4% | 8.2% | 25.8% | 65.7% | 8.5% | 36.0% | 58.2% | 5.8% | 7.0% | 81.5% | 11.5% | 30.8% | 60.5% | 8.7% | 20.3% | 70.8% | 8.9% | 23.3% | 69.4% | 7.3% | 11.7% | 83.3% | 5.0% | 44.0% | 48.6% | 7.4% |
tri_ment032observing | 48.7% | 45.2% | 6.2% | 44.9% | 49.8% | 5.3% | 60.9% | 35.2% | 4.0% | 34.8% | 57.1% | 8.1% | 40.6% | 52.3% | 7.1% | 62.5% | 32.1% | 5.4% | 52.4% | 42.3% | 5.3% | 41.2% | 53.7% | 5.0% | 50.4% | 42.6% | 7.0% |
tri_mnt032azstudwk | 29.8% | 49.9% | 20.3% | 29.7% | 53.1% | 17.2% | 39.4% | 47.4% | 13.1% | 14.1% | 55.5% | 30.4% | 33.9% | 52.5% | 13.6% | 36.6% | 50.5% | 12.8% | 31.7% | 47.4% | 20.9% | 21.6% | 60.5% | 17.9% | 32.9% | 45.5% | 21.6% |
tri_mnt032asmts | 27.9% | 53.0% | 19.0% | 28.0% | 56.4% | 15.6% | 40.3% | 51.1% | 8.6% | 13.4% | 57.8% | 28.9% | 33.6% | 54.3% | 12.2% | 36.5% | 52.0% | 11.5% | 29.3% | 50.9% | 19.8% | 20.7% | 62.1% | 17.1% | 29.1% | 50.3% | 20.6% |
tri_mnt032behavior | 13.0% | 57.4% | 29.5% | 13.8% | 59.4% | 26.8% | 13.5% | 62.0% | 24.5% | 7.3% | 54.0% | 38.8% | 12.5% | 62.1% | 25.4% | 15.6% | 63.5% | 20.9% | 12.4% | 56.5% | 31.1% | 10.0% | 62.6% | 27.4% | 17.3% | 53.8% | 28.9% |
tri_mnt032reflectoge | 14.4% | 57.5% | 28.0% | 14.5% | 60.5% | 25.0% | 19.5% | 59.4% | 21.1% | 5.8% | 55.1% | 39.0% | 15.5% | 60.0% | 24.5% | 14.7% | 65.3% | 20.0% | 13.9% | 56.4% | 29.7% | 8.0% | 66.2% | 25.8% | 21.8% | 52.8% | 25.4% |
tri_mnt032align | 28.6% | 46.7% | 24.7% | 28.9% | 51.2% | 19.8% | 49.8% | 39.1% | 11.1% | 12.2% | 51.2% | 36.6% | 35.3% | 48.2% | 16.5% | 34.5% | 49.2% | 16.2% | 30.5% | 44.1% | 25.3% | 23.3% | 56.2% | 20.6% | 29.8% | 42.8% | 27.3% |
tri_mnt032other | 39.1% | 34.0% | 26.9% | 40.0% | 34.0% | 26.0% | 47.8% | 30.0% | 22.2% | 29.5% | 35.2% | 35.3% | 38.2% | 35.9% | 25.9% | 47.7% | 34.8% | 17.5% | 37.6% | 33.2% | 29.2% | 38.2% | 39.1% | 22.7% | 42.5% | 32.9% | 24.5% |
Message was edited by: Patrick Dougherty I have added a sample of the data. I am seeking to know if the permitted value of "1" is significantly different to "1" across ALL*CO*DE*KY*MA etc. and so on for each permitted value . I have thought about just transforming the data into multiple binary variables and so each survey question would have 3 distinct new variables and running a simple ttest. I am attempting to avoid this approach. Thanks
You need the raw data, the counts.
I have the raw data, I tried uploading it but it is too large 20 vars 700k obs. What test/proc do I run to assess this?
We know you want to compare something between states versus within states. What exactly do you want to compare?
so I want to test permitted value 1 sig between ALL*CO*DE*KY*MA*MD*NC*etc.
then test permitted value 2 sig between ALL*CO*DE*KY*MA*MD*NC*etc.
then test permitted value 3 sig between ALL*CO*DE*KY*MA*MD*NC*etc.
So I would expect output would look something like this for each permitted value (1,2 and 3)
ALL CO DE KY MA
(A) (B) (C) (D) (E)
A CDE BC D AB - these denote sig diff between states
Does this make sense? I feel like i've done this in a previous life?
Thanks,
Pat
Are you using "ALL*CO*DE*KY*MA*MD*MC*etc" to indicate a 9-way interaction? Or something else? Is this a part of a model? If so, you'd have to give us both the left and right hand side of the model.
Are you talking about a sig (which I would refer to as a "significant difference"?) between percents, or means, or standard deviations, or gophers, or something else?
The latter...
Thanks,
Pat
How about posting a sample of the data, rather than the entire dataset? I think you are trying for PROC FREQ, but the syntax doesn't quite work as you are posting. I think you want something like:
proc freq;
tables state*variable;
weight obsweight,
run;
In this case, state takes on the values of ALL, CO, DE, KY, and so on, variable takes on the level values for the variable (1, 2, 3) and obsweight gives the counts for a given level by state combination. The overall test is whether the 'profile' is the same for all states..
Other tests can be obtained by collapsing the categories into some sort of binary classification.
However, if all you have are the percentages, and no measure of sample size associated with them, you will NOT be able to come up with valid statistical tests.
Steve Denham
Message was edited by: Steve Denham
I've attached some sample data - if you dont mind taking another look and letting me know what you think.
Thanks,
Pat
I looked through the data and still have some questions. For instance, I see 208 records that have "CO' as the STATE variable, and then what I assume are counts (as they go from 1 to some other integer) for the many variables. However, I see no zeroes, so perhaps these are not counts but rather responses on a scale as mentioned on my previous post. In any case, the subjects are not identified, although something could be whipped up from the record number within each state. So I want to make sure that my assumption is correct. If so, then the code I provided just needs to be modified to remove the WEIGHT statement. This will test if the profiles within each variable are the same across the states. A comparison to ALL states could duplicate the dataset, changing STATE to ALL, and proceeding, but that is pseudo-replication and should be avoided. Maybe each state could be compared to the overall rate in a separate analysis, where the TESTP= option is invoked. That would be much better, but is going to require a lot of coding.
Steve Denham
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.