07-01-2014 11:31 AM
Looking to perform a correlation and post-hoc test (tukey) with SAS Web Editor using data that contains arsenic levels parts per billion (ppb) and lung cancer rates by state.
07-01-2014 02:24 PM
Proc Corr works for correlation. Not sure what you mean by post hoc tests, so can't recommend anything beyond that.
If you need more help, I suggest posting sample data and a summary of what you're looking for.
07-01-2014 02:36 PM
i posted sample data.
i dont know if you have a moment to review the files i have attached, but if you do can you tell me if i am on track or way off.
My goal is to see if their is a correlation between my arsenic data which incident parts per billion by state and my lung cancer data which includes male and female rates by state. also, to run a tukey post hoc test.
reviewing my notes and the SAS forum i was able to complete a correlation analysis, i think. Can you review the attached files to see if i wrote the correct commands for the correlation analysis?
Also, i was not able to get any results for the tukey post hoc test i tried to run. Can you give me any insight on what i am leaving out or if i even have necessary information to complete the test or point me in the direction where i could find the answers?
07-01-2014 02:43 PM
The correlation looks fine. I'm going to assume that the rates are age-sex standardized so are actually comparable across states, if not you'll need to do that.
You don't have N, sample sizes, so I'm not sure you can actually do post-hoc tests. There are several people more adept at statistics, I've gotten rusty in the past 2 years. You may want to try reposting a more specified question in the Statistics Forum specifically.
07-02-2014 02:30 PM
or PROC GLM:
proc glm data=datasetname;
model lung arsenicpu groundwa=state;
lsmeans state/pdiff adjust=tukey;
BUT this tests whether at least one state differs from all the rest, for each of the three variables. Is that what you want?
or is it something like you want to predict lung, based on state, public arsenic level and groundwater? Or something else entirely?
Finally, aside from 's concern about age-sex standaradized rates, consider the following: for groundwater at least, values within a state are not necessarily homogeneous, and using state as a grouping factor for this is bad epidemiology. For example, east Texas is a lot more like Arkansas or Louisiana than it is like west Texas, as far as groundwater sources. And California is a hot mess, with multiple watersheds. So as a learning exercise, you can learn how to do this in SAS, but as far as providing a meaningful analysis--this is not the way to go.