<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Proc Princomp - Outlier Observations identification in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/652911#M22474</link>
    <description>&lt;P&gt;If the question is: I see the ellipse on a two-dimensional plot and I want to know if points are outside the ellipse, then p=2.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But if you want to ask the question (which is entirely reasonable to ask) is this an outlier in 5 dimensional space, then p=5. You can't draw a 5-dimensional plot, but the question is answered the same way, and people will often plot the t-squared number against the limit — not a scatter plot, but more like a trend plot with an upper limit.&lt;/P&gt;</description>
    <pubDate>Wed, 03 Jun 2020 15:24:42 GMT</pubDate>
    <dc:creator>PaigeMiller</dc:creator>
    <dc:date>2020-06-03T15:24:42Z</dc:date>
    <item>
      <title>Proc Princomp - Outlier Observations identification</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/547102#M8338</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am learning Proc Princomp for Principal Component Analysis.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have this code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;ods graphics on;
proc princomp data=sashelp.class PLOTS=SCORE(ELLIPSE NCOMP=3) out=class_out outstat=class_stat;
run;
ods graphics off;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Which produces the attached output in chart:&lt;/P&gt;&lt;P&gt;Based on the chart produced-- Robert is an outlier as it falls outside the 95% border line.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;How can i get that information in a dataset without referring to the chart?&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What option am i missing here?&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;My class_out dataset does not seem to identify that observation #16(Robert) as an outlier.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Please advise.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Thanks&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Mar 2019 04:49:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/547102#M8338</guid>
      <dc:creator>david27</dc:creator>
      <dc:date>2019-03-29T04:49:07Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Princomp - Outlier Observations identification</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/547145#M8344</link>
      <description>&lt;P&gt;Anything in the PROC PRINCOMP (or any other PROC) output can be included in a SAS data set, using ODS OUTPUT.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://documentation.sas.com/?docsetId=odsug&amp;amp;docsetTarget=p0oxrbinw6fjuwn1x23qam6dntyd.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en" target="_blank" rel="noopener"&gt;https://documentation.sas.com/?docsetId=odsug&amp;amp;docsetTarget=p0oxrbinw6fjuwn1x23qam6dntyd.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://documentation.sas.com/?docsetId=statug&amp;amp;docsetTarget=statug_princomp_details07.htm&amp;amp;docsetVersion=15.1&amp;amp;locale=en" target="_blank" rel="noopener"&gt;https://documentation.sas.com/?docsetId=statug&amp;amp;docsetTarget=statug_princomp_details07.htm&amp;amp;docsetVersion=15.1&amp;amp;locale=en&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Mar 2019 12:09:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/547145#M8344</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-03-29T12:09:43Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Princomp - Outlier Observations identification</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/547170#M8348</link>
      <description>&lt;P&gt;Apologies but need to clarify the question.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;How can I get the &lt;FONT color="#FF0000"&gt;outlier&lt;/FONT&gt; information in a dataset?&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;For example:&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Robert(obs=16) falls outside the 95% line. I want a dataset which has only this observation or atleast identifies this observation as falling outside the 95% line.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also,&lt;/P&gt;&lt;P&gt;That brings another question:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Can we change that 95% threshold to 99% threshold or 90% threshold?&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Mar 2019 12:49:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/547170#M8348</guid>
      <dc:creator>david27</dc:creator>
      <dc:date>2019-03-29T12:49:16Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Princomp - Outlier Observations identification</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/547198#M8350</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/215179"&gt;@david27&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Apologies but need to clarify the question.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;How can I get the &lt;FONT color="#FF0000"&gt;outlier&lt;/FONT&gt; information in a dataset?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;For example:&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Robert(obs=16) falls outside the 95% line. I want a dataset which has only this observation or atleast identifies this observation as falling outside the 95% line.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also,&lt;/P&gt;
&lt;P&gt;That brings another question:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Can we change that 95% threshold to 99% threshold or 90% threshold?&lt;/STRONG&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;So, my apologies for my earlier answer being somewhat off target.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The ellipse that determines the "outliers" is actually a multivariate T-squared calculation, and isn't hard to get from the PCA outputs, but you have to know the steps.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It is easier to get the t-squared values for a PCA analysis using PROC PLS rather than using PROC PRINCOMP.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc pls data=sashelp.class;
	model age height weight = age height weight;
	output out=pls_stats tsquare=tsq;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;To understand this, a PLS analysis where the x and y variables in the model are identical, produces a PCA analysis! And it produces the t-squared value, which then can be used to determine if the observation is inside or outside the ellipse.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then, you compare t-squared to the ellipse value which is computed from the formula for the&amp;nbsp;T-SQUARED_limit = p(n-1)*F/(n-p) where n is the number of data points (19), p is the number of dimensions (since the ellipses are drawn in two dimensions, I believe SAS used p=2, and F is the value from the F distribution table with p and n-p degrees of freedom.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
    p=2;
    n=19;
    f=finv(0.95,p,n-p);
    tsq_lim=p*(n-1)*f/(n-p);
    put tsq_lim= f=;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Obviously, if you don't want a 95% limit, and you want a 90% limit, you make the change above in FINV. To draw the ellipses with 90% confidence, you would change the alpha option PLOTS=(SCORE(ALPHA=10)) in PROC PRINCOMP.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For completeness, here are the calculations of t-squared from PROC PRINCOMP:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc stdize data=class_out out=class_out1;
	var prin1-prin3;
run;

data tsquared;
	set class_out1;
	tsq=uss(of prin1-prin3);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Mar 2019 14:20:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/547198#M8350</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-03-29T14:20:11Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Princomp - Outlier Observations identification</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/547244#M8353</link>
      <description>&lt;P&gt;Thank You very much&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10892"&gt;@PaigeMiller&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You helped me understand Proc Princomp more and also provided an alternative- proc pls.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank You Again...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Mar 2019 15:48:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/547244#M8353</guid>
      <dc:creator>david27</dc:creator>
      <dc:date>2019-03-29T15:48:22Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Princomp - Outlier Observations identification</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/652881#M22473</link>
      <description>&lt;P&gt;Hello &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10892"&gt;@PaigeMiller&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So coming back to this after long time.&lt;/P&gt;&lt;P&gt;Had a quick question on your comment- "&lt;SPAN style="color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 300; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #f3ffeb; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;p is the number of dimensions (since the ellipses are drawn in two dimensions, I believe SAS used p=2,&lt;/SPAN&gt;"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ellipses will always be drawn in 2-D. and that will make p=2 all the time.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If we take say 5 variables in our predictions for sashlep.cars&lt;/P&gt;&lt;P&gt;&lt;FONT&gt;HORSEPOWER&amp;nbsp;MPG_HIGHWAY&amp;nbsp;WEIGHT&amp;nbsp;LENGTH&amp;nbsp;WHEELBASE &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT&gt;Do we have&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT&gt;p=2 because ellipses are drawn in 2-Dimensions?&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT&gt;OR&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT&gt;p=4 because we are taking 5 variables in our prediction(5-1)?&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT&gt;OR&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT&gt;p=5 because we have 5 dimensions- variables in our prediction?&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jun 2020 14:13:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/652881#M22473</guid>
      <dc:creator>david27</dc:creator>
      <dc:date>2020-06-03T14:13:03Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Princomp - Outlier Observations identification</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/652911#M22474</link>
      <description>&lt;P&gt;If the question is: I see the ellipse on a two-dimensional plot and I want to know if points are outside the ellipse, then p=2.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But if you want to ask the question (which is entirely reasonable to ask) is this an outlier in 5 dimensional space, then p=5. You can't draw a 5-dimensional plot, but the question is answered the same way, and people will often plot the t-squared number against the limit — not a scatter plot, but more like a trend plot with an upper limit.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jun 2020 15:24:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Proc-Princomp-Outlier-Observations-identification/m-p/652911#M22474</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2020-06-03T15:24:42Z</dc:date>
    </item>
  </channel>
</rss>

