<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Creating summary statistics but dealing with outliers across time and classes in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Creating-summary-statistics-but-dealing-with-outliers-across/m-p/583176#M34556</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/116786"&gt;@Sean_OConnor&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I producing some income statistics across different cohorts (regions, age groups, etc.).&amp;nbsp; When it comes to reporting on values I'm just reporting the median. However, the issue I'm running into is when I start to cross-classify, segment my data into smaller groups, outlier instances are impacting on the median.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Please show us an example of the data showing this problem of outliers impacting the median. As I understand things, the median should not be affected by outliers, unless there are only two data points in a group, or when half the values in a group are outliers.&lt;/P&gt;</description>
    <pubDate>Thu, 22 Aug 2019 13:36:36 GMT</pubDate>
    <dc:creator>PaigeMiller</dc:creator>
    <dc:date>2019-08-22T13:36:36Z</dc:date>
    <item>
      <title>Creating summary statistics but dealing with outliers across time and classes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Creating-summary-statistics-but-dealing-with-outliers-across/m-p/583165#M34555</link>
      <description>&lt;P&gt;Folks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I producing some income statistics across different cohorts (regions, age groups, etc.).&amp;nbsp; When it comes to reporting on values I'm just reporting the median. However, the issue I'm running into is when I start to cross-classify, segment my data into smaller groups, outlier instances are impacting on the median.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My code below would work fine if i didn't have the by clause in the proc univariate statement.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, within my dataset I have different values which should adhere to different conditions, depending where they stand in the by clause.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a simple way to account for this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc univariate data = pop;
by  class year_filed age_filed_total;
var income_price_ratiot1;
output out=boxStats_before median=median  qrange=iqr_;
run; 

data _null_;
	set boxStats_before;
	call symput ('median',median);
	call symput ('iqr', iqr);
run; 

%put &amp;amp;median;
%put &amp;amp;iqr;

data trimmed;
set pop;
    if (income_price_ratiot1 le &amp;amp;median + 1.5 * &amp;amp;iqr) and (income_price_ratiot1 ge &amp;amp;median - 1.5 * &amp;amp;iqr); 
run; &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 22 Aug 2019 13:18:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Creating-summary-statistics-but-dealing-with-outliers-across/m-p/583165#M34555</guid>
      <dc:creator>Sean_OConnor</dc:creator>
      <dc:date>2019-08-22T13:18:36Z</dc:date>
    </item>
    <item>
      <title>Re: Creating summary statistics but dealing with outliers across time and classes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Creating-summary-statistics-but-dealing-with-outliers-across/m-p/583176#M34556</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/116786"&gt;@Sean_OConnor&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I producing some income statistics across different cohorts (regions, age groups, etc.).&amp;nbsp; When it comes to reporting on values I'm just reporting the median. However, the issue I'm running into is when I start to cross-classify, segment my data into smaller groups, outlier instances are impacting on the median.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Please show us an example of the data showing this problem of outliers impacting the median. As I understand things, the median should not be affected by outliers, unless there are only two data points in a group, or when half the values in a group are outliers.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Aug 2019 13:36:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Creating-summary-statistics-but-dealing-with-outliers-across/m-p/583176#M34556</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-08-22T13:36:36Z</dc:date>
    </item>
    <item>
      <title>Re: Creating summary statistics but dealing with outliers across time and classes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Creating-summary-statistics-but-dealing-with-outliers-across/m-p/583216#M34558</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/116786"&gt;@Sean_OConnor&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Folks,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I producing some income statistics across different cohorts (regions, age groups, etc.).&amp;nbsp; When it comes to reporting on values I'm just reporting the median. However, the issue I'm running into is when I start to cross-classify, segment my data into smaller groups, outlier instances are impacting on the median.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My code below would work fine if i didn't have the by clause in the proc univariate statement.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, within my dataset I have different values which should adhere to different conditions, depending where they stand in the by clause.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a simple way to account for this?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc univariate data = pop;
by  class year_filed age_filed_total;
var income_price_ratiot1;
output out=boxStats_before median=median  qrange=iqr_;
run; 

data _null_;
	set boxStats_before;
	call symput ('median',median);
	call symput ('iqr', iqr);
run; 

%put &amp;amp;median;
%put &amp;amp;iqr;

data trimmed;
set pop;
    if (income_price_ratiot1 le &amp;amp;median + 1.5 * &amp;amp;iqr) and (income_price_ratiot1 ge &amp;amp;median - 1.5 * &amp;amp;iqr); 
run; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;You don't clearly describe exactly what impact you may be seeing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suggest that for summary statistics in a data set you might consider using proc summary as it will create different summaries that you can select.&lt;/P&gt;
&lt;P&gt;Consider the output data set from&lt;/P&gt;
&lt;PRE&gt;proc summary data = pop;
   class  class year_filed age_filed_total;
   var income_price_ratiot1;
   output out=boxStats_before median=median  qrange=iqr_;
run; &lt;/PRE&gt;
&lt;P&gt;There will be&amp;nbsp; a variable _type_ in the data set that indicates the level of combinations of the variables on the Class statement.&lt;/P&gt;
&lt;P&gt;So there will be an overall summary, class only, year_filed only age_filed_total only, class and year_filed, class and age_filed_total, year_filed and age_filed_total and all three of the variable combinations.&lt;/P&gt;
&lt;P&gt;Then you can select which summary you may need for specific uses by selecting on the _type_ value associated with the combination.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Aug 2019 15:07:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Creating-summary-statistics-but-dealing-with-outliers-across/m-p/583216#M34558</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-08-22T15:07:16Z</dc:date>
    </item>
  </channel>
</rss>

