BookmarkSubscribeRSS Feed
Pixydust12
Calcite | Level 5
I have two data sets and I want to see if they are related. I want to know if the air pollution levels by county (data set one) are related to number of poultry farms by county (data set two). This would be a Pearson correlation, correct?
10 REPLIES 10
MarkusWeick
Barite | Level 11

Yes, Pearson Correlation ist an important step. But you should also do scatter-plot, as the relation could be non-linear.

Cheers, Markus

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
Pixydust12
Calcite | Level 5
Hi Markus,
So once I do a scatter plot and I see that the relationship is not linear, does that sufficiently answer my research question (To evaluate if air pollution levels are greater in counties near poultry farms), or should I do another test?
MarkusWeick
Barite | Level 11

Hi Pixydust,

depends on the image you see. Maybe you'll find another hypothesis for the relation.

Would you like to share the plot?

Cheers

Markus

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
Pixydust12
Calcite | Level 5

Pixydust12_0-1638297098996.png

My code:

proc sgplot data=Iowa;
reg x=Poultry_Farms y=Air_Pollution/ clm cli;
run;

 

Should I even do a scatter plot since one variable is continuous and one is discrete? I think I'm confusing myself. My goal is to see if there is a correlation between the two.

MarkusWeick
Barite | Level 11

Hello @Pixydust12 ,

 

That's the plot I asked for. Its no problem to mix contineous and discrete value. To me it looks as there is no relationship. Correspondingly the correlation coefficient should be close to zero.

Cheers Markus

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
Pixydust12
Calcite | Level 5
Hi Mark,
Last question. If I have 5 years worth of air pollution data, but only 1 year worth of poultry farms, could I compare the mean air pollution to poultry farms with a Pearson Correlation? The scatter plot above is an example of the mean levels for the past 5 years and the # of poultry farms for one year.
MarkusWeick
Barite | Level 11

Hello @Pixydust12 ,

 

I would assume the number of poultry farms to be rather stable. So taking the 5 years mean  for the air polution should be ok. But to be on the safe side (if there is a safe side in statistics), I would also do the plot for the correspondig 1year data of air polution.

Cheers Markus

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
sbxkoenk
SAS Super FREQ

Hello @Pixydust12 ,

 

The relationship might be non-linear in any direction of course.

Air pollution levels can be greater in counties near poultry farms.

Air pollution levels can be smaller / lower in counties near poultry farms.

The plot will tell you (if not looking like a random scatter), but the plot should support an hypothesis test or a small model that "proves" this.

 

Koen

sbxkoenk
SAS Super FREQ

Hello,

 

You need one dataset with 3 columns (and not two datasets --> merge them by county).

  • County = ID-variable
  • pollution level (col1)
  • number of poultry farms (col2)

 

As said by @MarkusWeick , you first need to graph / plot your data to get better insights on the analysis that might be appropriate.

Pearson correlation can be interesting but it is only measuring linear correlation indeed.

You can try a simple linear regression as well, but maybe a spline fits the data better?

 

Thanks,

Koen

Pixydust12
Calcite | Level 5
Question: Would I graph or use a scatter plot? I know air pollution levels are a continuous variable, but is number of poultry farms in each county continuous or discrete?

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1386 views
  • 3 likes
  • 3 in conversation