Re: Sig test across states on categorical data

pdougherty · Posted 10-24-2013 01:21 PM

What test do I run if want to assess the stat sig of variables that have multiple permitted values (categorical). this is the output table that I want to assess stat sig between states not within state. So is '1' stat sig diff than ALL*CO*DE*KY*MA*MD*etc? and '2' and '3'?

Thanks,

Pat

	State
	ALL			CO			DE			KY			MA			MD			NC			OH			TN
	1	2	3	1	2	3	1	2	3	1	2	3	1	2	3	1	2	3	1	2	3	1	2	3	1	2	3
tri_mnt032devplns	27.5%	47.3%	25.2%	26.6%	50.2%	23.2%	42.9%	42.6%	14.4%	10.7%	52.8%	36.4%	30.8%	49.8%	19.5%	30.6%	52.8%	16.6%	31.5%	43.3%	25.3%	28.2%	52.4%	19.4%	28.0%	44.6%	27.5%
tri_mnt032observed	25.4%	66.4%	8.2%	25.8%	65.7%	8.5%	36.0%	58.2%	5.8%	7.0%	81.5%	11.5%	30.8%	60.5%	8.7%	20.3%	70.8%	8.9%	23.3%	69.4%	7.3%	11.7%	83.3%	5.0%	44.0%	48.6%	7.4%
tri_ment032observing	48.7%	45.2%	6.2%	44.9%	49.8%	5.3%	60.9%	35.2%	4.0%	34.8%	57.1%	8.1%	40.6%	52.3%	7.1%	62.5%	32.1%	5.4%	52.4%	42.3%	5.3%	41.2%	53.7%	5.0%	50.4%	42.6%	7.0%
tri_mnt032azstudwk	29.8%	49.9%	20.3%	29.7%	53.1%	17.2%	39.4%	47.4%	13.1%	14.1%	55.5%	30.4%	33.9%	52.5%	13.6%	36.6%	50.5%	12.8%	31.7%	47.4%	20.9%	21.6%	60.5%	17.9%	32.9%	45.5%	21.6%
tri_mnt032asmts	27.9%	53.0%	19.0%	28.0%	56.4%	15.6%	40.3%	51.1%	8.6%	13.4%	57.8%	28.9%	33.6%	54.3%	12.2%	36.5%	52.0%	11.5%	29.3%	50.9%	19.8%	20.7%	62.1%	17.1%	29.1%	50.3%	20.6%
tri_mnt032behavior	13.0%	57.4%	29.5%	13.8%	59.4%	26.8%	13.5%	62.0%	24.5%	7.3%	54.0%	38.8%	12.5%	62.1%	25.4%	15.6%	63.5%	20.9%	12.4%	56.5%	31.1%	10.0%	62.6%	27.4%	17.3%	53.8%	28.9%
tri_mnt032reflectoge	14.4%	57.5%	28.0%	14.5%	60.5%	25.0%	19.5%	59.4%	21.1%	5.8%	55.1%	39.0%	15.5%	60.0%	24.5%	14.7%	65.3%	20.0%	13.9%	56.4%	29.7%	8.0%	66.2%	25.8%	21.8%	52.8%	25.4%
tri_mnt032align	28.6%	46.7%	24.7%	28.9%	51.2%	19.8%	49.8%	39.1%	11.1%	12.2%	51.2%	36.6%	35.3%	48.2%	16.5%	34.5%	49.2%	16.2%	30.5%	44.1%	25.3%	23.3%	56.2%	20.6%	29.8%	42.8%	27.3%
tri_mnt032other	39.1%	34.0%	26.9%	40.0%	34.0%	26.0%	47.8%	30.0%	22.2%	29.5%	35.2%	35.3%	38.2%	35.9%	25.9%	47.7%	34.8%	17.5%	37.6%	33.2%	29.2%	38.2%	39.1%	22.7%	42.5%	32.9%	24.5%

Message was edited by: Patrick Dougherty I have added a sample of the data. I am seeking to know if the permitted value of "1" is significantly different to "1" across ALL*CO*DE*KY*MA etc. and so on for each permitted value . I have thought about just transforming the data into multiple binary variables and so each survey question would have 3 distinct new variables and running a simple ttest. I am attempting to avoid this approach. Thanks

Reeza · Posted 10-24-2013 01:25 PM

You need the raw data, the counts.

pdougherty · Posted 10-24-2013 01:39 PM

I have the raw data, I tried uploading it but it is too large 20 vars 700k obs. What test/proc do I run to assess this?

PaigeMiller · Posted 10-24-2013 01:26 PM

We know you want to compare something between states versus within states. What exactly do you want to compare?

--
Paige Miller

pdougherty · Posted 10-24-2013 01:37 PM

so I want to test permitted value 1 sig between ALL*CO*DE*KY*MA*MD*NC*etc.

then test permitted value 2 sig between ALL*CO*DE*KY*MA*MD*NC*etc.

then test permitted value 3 sig between ALL*CO*DE*KY*MA*MD*NC*etc.

So I would expect output would look something like this for each permitted value (1,2 and 3)

ALL CO DE KY MA

(A) (B) (C) (D) (E)

A CDE BC D AB - these denote sig diff between states

Does this make sense? I feel like i've done this in a previous life?

Thanks,

Pat

PaigeMiller · Posted 10-24-2013 01:46 PM

Are you using "ALL*CO*DE*KY*MA*MD*MC*etc" to indicate a 9-way interaction? Or something else? Is this a part of a model? If so, you'd have to give us both the left and right hand side of the model.

Are you talking about a sig (which I would refer to as a "significant difference"?) between percents, or means, or standard deviations, or gophers, or something else?

--
Paige Miller

pdougherty · Posted 10-24-2013 01:56 PM

The latter...

Thanks,

Pat

SteveDenham · Posted 10-25-2013 09:24 AM

How about posting a sample of the data, rather than the entire dataset? I think you are trying for PROC FREQ, but the syntax doesn't quite work as you are posting. I think you want something like:

proc freq;

tables state*variable;

weight obsweight,

run;

In this case, state takes on the values of ALL, CO, DE, KY, and so on, variable takes on the level values for the variable (1, 2, 3) and obsweight gives the counts for a given level by state combination. The overall test is whether the 'profile' is the same for all states..

Other tests can be obtained by collapsing the categories into some sort of binary classification.

However, if all you have are the percentages, and no measure of sample size associated with them, you will NOT be able to come up with valid statistical tests.

Steve Denham

Message was edited by: Steve Denham

pdougherty · Posted 10-29-2013 11:02 AM

I've attached some sample data - if you dont mind taking another look and letting me know what you think.

Thanks,

Pat

SteveDenham · Posted 10-29-2013 11:35 AM

I looked through the data and still have some questions. For instance, I see 208 records that have "CO' as the STATE variable, and then what I assume are counts (as they go from 1 to some other integer) for the many variables. However, I see no zeroes, so perhaps these are not counts but rather responses on a scale as mentioned on my previous post. In any case, the subjects are not identified, although something could be whipped up from the record number within each state. So I want to make sure that my assumption is correct. If so, then the code I provided just needs to be modified to remove the WEIGHT statement. This will test if the profiles within each variable are the same across the states. A comparison to ALL states could duplicate the dataset, changing STATE to ALL, and proceeding, but that is pseudo-replication and should be avoided. Maybe each state could be compared to the overall rate in a separate analysis, where the TESTP= option is invoked. That would be much better, but is going to require a lot of coding.

Steve Denham