BookmarkSubscribeRSS Feed
pdougherty
Calcite | Level 5

What test do I run if want to assess the stat sig of variables that have multiple permitted values (categorical). this is the output table that I want to assess stat sig between states not within state.  So is '1'  stat sig diff than ALL*CO*DE*KY*MA*MD*etc? and '2' and '3'?

Thanks,

Pat

State
ALLCODEKYMAMDNCOHTN
123123123123123123123123123
tri_mnt032devplns27.5%47.3%25.2%26.6%50.2%23.2%42.9%42.6%14.4%10.7%52.8%36.4%30.8%49.8%19.5%30.6%52.8%16.6%31.5%43.3%25.3%28.2%52.4%19.4%28.0%44.6%27.5%
tri_mnt032observed25.4%66.4%8.2%25.8%65.7%8.5%36.0%58.2%5.8%7.0%81.5%11.5%30.8%60.5%8.7%20.3%70.8%8.9%23.3%69.4%7.3%11.7%83.3%5.0%44.0%48.6%7.4%
tri_ment032observing48.7%45.2%6.2%44.9%49.8%5.3%60.9%35.2%4.0%34.8%57.1%8.1%40.6%52.3%7.1%62.5%32.1%5.4%52.4%42.3%5.3%41.2%53.7%5.0%50.4%42.6%7.0%
tri_mnt032azstudwk29.8%49.9%20.3%29.7%53.1%17.2%39.4%47.4%13.1%14.1%55.5%30.4%33.9%52.5%13.6%36.6%50.5%12.8%31.7%47.4%20.9%21.6%60.5%17.9%32.9%45.5%21.6%
tri_mnt032asmts27.9%53.0%19.0%28.0%56.4%15.6%40.3%51.1%8.6%13.4%57.8%28.9%33.6%54.3%12.2%36.5%52.0%11.5%29.3%50.9%19.8%20.7%62.1%17.1%29.1%50.3%20.6%
tri_mnt032behavior13.0%57.4%29.5%13.8%59.4%26.8%13.5%62.0%24.5%7.3%54.0%38.8%12.5%62.1%25.4%15.6%63.5%20.9%12.4%56.5%31.1%10.0%62.6%27.4%17.3%53.8%28.9%
tri_mnt032reflectoge14.4%57.5%28.0%14.5%60.5%25.0%19.5%59.4%21.1%5.8%55.1%39.0%15.5%60.0%24.5%14.7%65.3%20.0%13.9%56.4%29.7%8.0%66.2%25.8%21.8%52.8%25.4%
tri_mnt032align28.6%46.7%24.7%28.9%51.2%19.8%49.8%39.1%11.1%12.2%51.2%36.6%35.3%48.2%16.5%34.5%49.2%16.2%30.5%44.1%25.3%23.3%56.2%20.6%29.8%42.8%27.3%
tri_mnt032other39.1%34.0%26.9%40.0%34.0%26.0%47.8%30.0%22.2%29.5%35.2%35.3%38.2%35.9%25.9%47.7%34.8%17.5%37.6%33.2%29.2%38.2%39.1%22.7%42.5%32.9%24.5%

Message was edited by: Patrick Dougherty I have added a sample of the data.  I am seeking to know if the permitted value of "1" is significantly different to "1" across ALL*CO*DE*KY*MA etc.  and so on for each permitted value .  I have thought about just transforming the data into multiple binary variables and so each survey question would have 3 distinct new variables and running a simple ttest.  I am attempting to avoid this approach. Thanks

9 REPLIES 9
Reeza
Super User

You need the raw data, the counts.

pdougherty
Calcite | Level 5

I have the raw data, I tried uploading it but it is too large 20 vars 700k obs.  What test/proc do I run to assess this?

PaigeMiller
Diamond | Level 26


We know you want to compare something between states versus within states. What exactly do you want to compare?

--
Paige Miller
pdougherty
Calcite | Level 5

so I want to test permitted value 1 sig between ALL*CO*DE*KY*MA*MD*NC*etc.

then test permitted value 2 sig between  ALL*CO*DE*KY*MA*MD*NC*etc.

then test permitted value 3 sig between  ALL*CO*DE*KY*MA*MD*NC*etc.

So I would expect output would look something like this for each permitted value (1,2 and 3)

ALL     CO     DE     KY     MA

(A)      (B)       (C)     (D)     (E)

A        CDE     BC     D     AB - these denote sig diff between states

Does this make sense?  I feel like i've done this in a previous life?

Thanks,

Pat

PaigeMiller
Diamond | Level 26

Are you using "ALL*CO*DE*KY*MA*MD*MC*etc" to indicate a 9-way interaction? Or something else? Is this a part of a model? If so, you'd have to give us both the left and right hand side of the model.

Are you talking about a sig (which I would refer to as a "significant difference"?) between percents, or means, or standard deviations, or gophers, or something else?

--
Paige Miller
pdougherty
Calcite | Level 5

The latter...

Thanks,

Pat

SteveDenham
Jade | Level 19

How about posting a sample of the data, rather than the entire dataset?  I think you are trying for PROC FREQ, but the syntax doesn't quite work as you are posting.  I think you want something like:

proc freq;

tables state*variable;

weight obsweight,

run;

In this case, state takes on the values of ALL, CO, DE, KY, and so on, variable takes on the level values for the variable (1, 2, 3) and obsweight gives the counts for a given level by state combination.  The overall test is whether the 'profile' is the same for all states..

Other tests can be obtained by collapsing the categories into some sort of binary classification.

However, if all you have are the percentages, and no measure of sample size associated with them, you will NOT be able to come up with valid statistical tests.

Steve Denham

Message was edited by: Steve Denham

pdougherty
Calcite | Level 5

I've attached some sample data - if you dont mind taking another look and letting me know what you think.

Thanks,

Pat

SteveDenham
Jade | Level 19

I looked through the data and still have some questions.  For instance, I see 208 records that have "CO' as the STATE variable, and then what I assume are counts (as they go from 1 to some other integer) for the many variables.  However, I see no zeroes, so perhaps these are not counts but rather responses on a scale as mentioned on my previous post.  In any case, the subjects are not identified, although something could be whipped up from the record number within each state.  So I want to make sure that my assumption is correct.  If so, then the code I provided just needs to be modified to remove the WEIGHT statement.  This will test if the profiles within each variable are the same across the states.  A comparison to ALL states could  duplicate the dataset, changing STATE to ALL, and proceeding, but that is pseudo-replication and should be avoided. Maybe each state could be compared to the overall rate in a separate analysis, where the TESTP= option is invoked.  That would be much better, but is going to require a lot of coding.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1612 views
  • 0 likes
  • 4 in conversation