<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic PROC CORR producing the wrong result - suggestions please? in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-CORR-producing-the-wrong-result-suggestions-please/m-p/709753#M34366</link>
    <description>&lt;P&gt;On the Kaggle website, users have posted the code and results they got when they tested the Titanic "train" data set using both R and Python. I wanted to try it out using SAS, however, I'm getting a different result for the correlation between survival and sex.&lt;/P&gt;
&lt;P&gt;This is the result that Kaggle forum users got when they used Python or R to check the correlation between Survived (where 1=survived and 0=died). For sex, 0=female and 1=male (originally, the data set comes with a character variable showing "male" or "female" but a binary norminal is created using 0 and 1).&lt;/P&gt;
&lt;PRE&gt;Survival Correlation by: Sex
      Sex  Survived
0  female  0.742038
1    male  0.188908&lt;/PRE&gt;
&lt;P&gt;I tried it in SAS using the below PROC CORR, but my result showed a correlation of -0.54 between sex and survived. So I split Sex further into two variables Male and Female (where the variable gets a 1 if that gender is present, or 0 if it is not present). This time it just shows a correlation of 0.54 with female and survived, and -0.54 with male and survived. So it is definitely not the same as what people get when they use Python and R.&lt;/P&gt;
&lt;P&gt;I'm aware that technically binary variables like survived are not ideally suited to using Pearson correlation, since it is intended for continuous variables. But it is still helpful as part of the EDA process. Does anyone know how to get SAS to produce the same results that Python and R produced? Thanks.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;PROC CORR DATA=train PLOTS=SCATTER; 
VAR survived sex female male;
RUN;&lt;/PRE&gt;</description>
    <pubDate>Wed, 06 Jan 2021 22:29:31 GMT</pubDate>
    <dc:creator>Buzzy_Bee</dc:creator>
    <dc:date>2021-01-06T22:29:31Z</dc:date>
    <item>
      <title>PROC CORR producing the wrong result - suggestions please?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-CORR-producing-the-wrong-result-suggestions-please/m-p/709753#M34366</link>
      <description>&lt;P&gt;On the Kaggle website, users have posted the code and results they got when they tested the Titanic "train" data set using both R and Python. I wanted to try it out using SAS, however, I'm getting a different result for the correlation between survival and sex.&lt;/P&gt;
&lt;P&gt;This is the result that Kaggle forum users got when they used Python or R to check the correlation between Survived (where 1=survived and 0=died). For sex, 0=female and 1=male (originally, the data set comes with a character variable showing "male" or "female" but a binary norminal is created using 0 and 1).&lt;/P&gt;
&lt;PRE&gt;Survival Correlation by: Sex
      Sex  Survived
0  female  0.742038
1    male  0.188908&lt;/PRE&gt;
&lt;P&gt;I tried it in SAS using the below PROC CORR, but my result showed a correlation of -0.54 between sex and survived. So I split Sex further into two variables Male and Female (where the variable gets a 1 if that gender is present, or 0 if it is not present). This time it just shows a correlation of 0.54 with female and survived, and -0.54 with male and survived. So it is definitely not the same as what people get when they use Python and R.&lt;/P&gt;
&lt;P&gt;I'm aware that technically binary variables like survived are not ideally suited to using Pearson correlation, since it is intended for continuous variables. But it is still helpful as part of the EDA process. Does anyone know how to get SAS to produce the same results that Python and R produced? Thanks.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;PROC CORR DATA=train PLOTS=SCATTER; 
VAR survived sex female male;
RUN;&lt;/PRE&gt;</description>
      <pubDate>Wed, 06 Jan 2021 22:29:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-CORR-producing-the-wrong-result-suggestions-please/m-p/709753#M34366</guid>
      <dc:creator>Buzzy_Bee</dc:creator>
      <dc:date>2021-01-06T22:29:31Z</dc:date>
    </item>
    <item>
      <title>Re: PROC CORR producing the wrong result - suggestions please?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-CORR-producing-the-wrong-result-suggestions-please/m-p/709796#M34367</link>
      <description>&lt;P&gt;It looks like you are trying to calculate survival rate by sex. Here is how to get them and an answer to the question: did survival rate differ by sex:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc freq data=train;
table survived*sex / norow nocum nopercent chisq;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Try this to see if you get the values that you expect.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Jan 2021 04:12:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-CORR-producing-the-wrong-result-suggestions-please/m-p/709796#M34367</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2021-01-07T04:12:01Z</dc:date>
    </item>
    <item>
      <title>Re: PROC CORR producing the wrong result - suggestions please?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-CORR-producing-the-wrong-result-suggestions-please/m-p/709855#M34369</link>
      <description>&lt;P&gt;Thank you - using the two-way Proc Freq that you suggested does produce the same result (the bottom line shows 74.2% survival for females and 18.89% for males).&lt;/P&gt;
&lt;PRE&gt;Survived	Sex		
	0	1	Total
0	81	468	549
	25.8	81.11	
1	233	109	342
	74.2	18.89	
&lt;/PRE&gt;
&lt;P&gt;I read over the Python code the people had used on Kaggle and realised that they didn't even use a correlation technique; they've just used a pivot table and a cross tab and then created a title above it labelling it as "Survival Correlation by Sex." I was really confused because I know statistical packages can't create a real Pearson correlation coefficient unless the variables are continuous. But all the Kaggle people are actually doing is creating a table of percentages, which isn't a correlation at all &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Thanks for your help.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Jan 2021 08:59:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-CORR-producing-the-wrong-result-suggestions-please/m-p/709855#M34369</guid>
      <dc:creator>Buzzy_Bee</dc:creator>
      <dc:date>2021-01-07T08:59:09Z</dc:date>
    </item>
  </channel>
</rss>

