Programming the statistical procedures from SAS

Sig test across states on categorical data

Reply
Occasional Contributor
Posts: 7

Sig test across states on categorical data

What test do I run if want to assess the stat sig of variables that have multiple permitted values (categorical). this is the output table that I want to assess stat sig between states not within state.  So is '1'  stat sig diff than ALL*CO*DE*KY*MA*MD*etc? and '2' and '3'?

Thanks,

Pat

State
ALLCODEKYMAMDNCOHTN
123123123123123123123123123
tri_mnt032devplns27.5%47.3%25.2%26.6%50.2%23.2%42.9%42.6%14.4%10.7%52.8%36.4%30.8%49.8%19.5%30.6%52.8%16.6%31.5%43.3%25.3%28.2%52.4%19.4%28.0%44.6%27.5%
tri_mnt032observed25.4%66.4%8.2%25.8%65.7%8.5%36.0%58.2%5.8%7.0%81.5%11.5%30.8%60.5%8.7%20.3%70.8%8.9%23.3%69.4%7.3%11.7%83.3%5.0%44.0%48.6%7.4%
tri_ment032observing48.7%45.2%6.2%44.9%49.8%5.3%60.9%35.2%4.0%34.8%57.1%8.1%40.6%52.3%7.1%62.5%32.1%5.4%52.4%42.3%5.3%41.2%53.7%5.0%50.4%42.6%7.0%
tri_mnt032azstudwk29.8%49.9%20.3%29.7%53.1%17.2%39.4%47.4%13.1%14.1%55.5%30.4%33.9%52.5%13.6%36.6%50.5%12.8%31.7%47.4%20.9%21.6%60.5%17.9%32.9%45.5%21.6%
tri_mnt032asmts27.9%53.0%19.0%28.0%56.4%15.6%40.3%51.1%8.6%13.4%57.8%28.9%33.6%54.3%12.2%36.5%52.0%11.5%29.3%50.9%19.8%20.7%62.1%17.1%29.1%50.3%20.6%
tri_mnt032behavior13.0%57.4%29.5%13.8%59.4%26.8%13.5%62.0%24.5%7.3%54.0%38.8%12.5%62.1%25.4%15.6%63.5%20.9%12.4%56.5%31.1%10.0%62.6%27.4%17.3%53.8%28.9%
tri_mnt032reflectoge14.4%57.5%28.0%14.5%60.5%25.0%19.5%59.4%21.1%5.8%55.1%39.0%15.5%60.0%24.5%14.7%65.3%20.0%13.9%56.4%29.7%8.0%66.2%25.8%21.8%52.8%25.4%
tri_mnt032align28.6%46.7%24.7%28.9%51.2%19.8%49.8%39.1%11.1%12.2%51.2%36.6%35.3%48.2%16.5%34.5%49.2%16.2%30.5%44.1%25.3%23.3%56.2%20.6%29.8%42.8%27.3%
tri_mnt032other39.1%34.0%26.9%40.0%34.0%26.0%47.8%30.0%22.2%29.5%35.2%35.3%38.2%35.9%25.9%47.7%34.8%17.5%37.6%33.2%29.2%38.2%39.1%22.7%42.5%32.9%24.5%

Message was edited by: Patrick Dougherty I have added a sample of the data.  I am seeking to know if the permitted value of "1" is significantly different to "1" across ALL*CO*DE*KY*MA etc.  and so on for each permitted value .  I have thought about just transforming the data into multiple binary variables and so each survey question would have 3 distinct new variables and running a simple ttest.  I am attempting to avoid this approach. Thanks

Attachment
Grand Advisor
Posts: 16,927

Re: Sig test across states on categorical data

You need the raw data, the counts.

Occasional Contributor
Posts: 7

Re: Sig test across states on categorical data

I have the raw data, I tried uploading it but it is too large 20 vars 700k obs.  What test/proc do I run to assess this?

Trusted Advisor
Posts: 1,439

Re: Sig test across states on categorical data


We know you want to compare something between states versus within states. What exactly do you want to compare?

Occasional Contributor
Posts: 7

Re: Sig test across states on categorical data

so I want to test permitted value 1 sig between ALL*CO*DE*KY*MA*MD*NC*etc.

then test permitted value 2 sig between  ALL*CO*DE*KY*MA*MD*NC*etc.

then test permitted value 3 sig between  ALL*CO*DE*KY*MA*MD*NC*etc.

So I would expect output would look something like this for each permitted value (1,2 and 3)

ALL     CO     DE     KY     MA

(A)      (B)       (C)     (D)     (E)

A        CDE     BC     D     AB - these denote sig diff between states

Does this make sense?  I feel like i've done this in a previous life?

Thanks,

Pat

Trusted Advisor
Posts: 1,439

Re: Sig test across states on categorical data

Are you using "ALL*CO*DE*KY*MA*MD*MC*etc" to indicate a 9-way interaction? Or something else? Is this a part of a model? If so, you'd have to give us both the left and right hand side of the model.

Are you talking about a sig (which I would refer to as a "significant difference"?) between percents, or means, or standard deviations, or gophers, or something else?

Occasional Contributor
Posts: 7

Re: Sig test across states on categorical data

The latter...

Thanks,

Pat

Respected Advisor
Posts: 2,655

Re: Sig test across states on categorical data

How about posting a sample of the data, rather than the entire dataset?  I think you are trying for PROC FREQ, but the syntax doesn't quite work as you are posting.  I think you want something like:

proc freq;

tables state*variable;

weight obsweight,

run;

In this case, state takes on the values of ALL, CO, DE, KY, and so on, variable takes on the level values for the variable (1, 2, 3) and obsweight gives the counts for a given level by state combination.  The overall test is whether the 'profile' is the same for all states..

Other tests can be obtained by collapsing the categories into some sort of binary classification.

However, if all you have are the percentages, and no measure of sample size associated with them, you will NOT be able to come up with valid statistical tests.

Steve Denham

Message was edited by: Steve Denham

Occasional Contributor
Posts: 7

Re: Sig test across states on categorical data

I've attached some sample data - if you dont mind taking another look and letting me know what you think.

Thanks,

Pat

Respected Advisor
Posts: 2,655

Re: Sig test across states on categorical data

I looked through the data and still have some questions.  For instance, I see 208 records that have "CO' as the STATE variable, and then what I assume are counts (as they go from 1 to some other integer) for the many variables.  However, I see no zeroes, so perhaps these are not counts but rather responses on a scale as mentioned on my previous post.  In any case, the subjects are not identified, although something could be whipped up from the record number within each state.  So I want to make sure that my assumption is correct.  If so, then the code I provided just needs to be modified to remove the WEIGHT statement.  This will test if the profiles within each variable are the same across the states.  A comparison to ALL states could  duplicate the dataset, changing STATE to ALL, and proceeding, but that is pseudo-replication and should be avoided. Maybe each state could be compared to the overall rate in a separate analysis, where the TESTP= option is invoked.  That would be much better, but is going to require a lot of coding.

Steve Denham

Ask a Question
Discussion stats
  • 9 replies
  • 291 views
  • 0 likes
  • 4 in conversation