<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Getting rid of outlier data in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2423#M815</link>
    <description>Hi,&lt;BR /&gt;
&lt;BR /&gt;
How could I get rid of extreme rows ,let's say the top and the bottom 2 rows, of a table, according to the fact that their are 200 variables?&lt;BR /&gt;
&lt;BR /&gt;
Thanks in advance,&lt;BR /&gt;
&lt;BR /&gt;
Anais</description>
    <pubDate>Thu, 19 Jun 2008 10:21:26 GMT</pubDate>
    <dc:creator>deleted_user</dc:creator>
    <dc:date>2008-06-19T10:21:26Z</dc:date>
    <item>
      <title>Getting rid of outlier data</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2419#M811</link>
      <description>Let's say I have a dataset with 10,000 observations consisting of two variables. "Occupation" and "income". There are 5 different occupations in the dataset.&lt;BR /&gt;
&lt;BR /&gt;
I would like to get rid of highest 10% of observed income levels for each occupation.&lt;BR /&gt;
&lt;BR /&gt;
Can anybody help me on how to crack this with SAS or send me a simple code?&lt;BR /&gt;
&lt;BR /&gt;
THanks in advance</description>
      <pubDate>Tue, 06 Mar 2007 15:33:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2419#M811</guid>
      <dc:creator>abyss</dc:creator>
      <dc:date>2007-03-06T15:33:05Z</dc:date>
    </item>
    <item>
      <title>Re: Getting rid of outlier data</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2420#M812</link>
      <description>First count how many observations you have for each occupation, then read data skipping the last observations.&lt;BR /&gt;
&lt;BR /&gt;
Anyway I'm not sure this is the most statistically-reliable way to filter outlier ; you'd better consider the values distribution to find a "gap" where to put a cutoff for each occupation, rather than deciding that some x % of data is "false". But this would require a lot more time than running something like :&lt;BR /&gt;
&lt;BR /&gt;
PROC SQL ;&lt;BR /&gt;
	CREATE TABLE work.incomes AS&lt;BR /&gt;
		SELECT *,&lt;BR /&gt;
			   COUNT(*) AS obsNb&lt;BR /&gt;
		FROM yourDataSet&lt;BR /&gt;
		GROUP BY occupation&lt;BR /&gt;
		ORDER BY occupation, income &lt;BR /&gt;
	;&lt;BR /&gt;
QUIT ;&lt;BR /&gt;
DATA work.incomes ;&lt;BR /&gt;
	SET work.incomes ;&lt;BR /&gt;
	BY occupation ;&lt;BR /&gt;
	IF FIRST.occupation THEN currNb = 0 ;&lt;BR /&gt;
	currNb + 1 ;&lt;BR /&gt;
	IF currNb &amp;lt;= obsNb*.99 THEN OUTPUT ;&lt;BR /&gt;
RUN ;</description>
      <pubDate>Wed, 07 Mar 2007 15:47:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2420#M812</guid>
      <dc:creator>Olivier</dc:creator>
      <dc:date>2007-03-07T15:47:45Z</dc:date>
    </item>
    <item>
      <title>Re: Getting rid of outlier data</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2421#M813</link>
      <description>In EG 4.1, you can use the rank task to generate decile ranks by income group and then use the filter task to remove the largest value.</description>
      <pubDate>Wed, 14 Mar 2007 23:57:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2421#M813</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2007-03-14T23:57:10Z</dc:date>
    </item>
    <item>
      <title>Re: Getting rid of outlier data</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2422#M814</link>
      <description>automating the process could use code like below &lt;BR /&gt;
( I used test data =sashelp.class&lt;BR /&gt;
          grouping by sex&lt;BR /&gt;
          filtering off top 10% of weight )&lt;BR /&gt;
[prE]&lt;BR /&gt;
* fast p90 filter;                  &lt;BR /&gt;
%let data= sashelp.class ;&lt;BR /&gt;
%let bygrp = sex ;&lt;BR /&gt;
%let filtr = weight ;&lt;BR /&gt;
* collect 90% level in each by group /class; &lt;BR /&gt;
proc means data= &amp;amp;data  noprint nway ;&lt;BR /&gt;
  var &amp;amp;filtr ;&lt;BR /&gt;
  class &amp;amp;bygrp ;&lt;BR /&gt;
  output out= work.P90S  p90= p90 ;&lt;BR /&gt;
quit;&lt;BR /&gt;
* prepare lookup table that will return the P90&lt;BR /&gt;
  for a value of the bygroup / class var ;&lt;BR /&gt;
data cntl;&lt;BR /&gt;
 retain fmtname 'p90s' type 'I';&lt;BR /&gt;
 set;&lt;BR /&gt;
run;&lt;BR /&gt;
* build the informat: class var is "range" and the&lt;BR /&gt;
  statistic is the "label" to be returned ;&lt;BR /&gt;
proc format cntlin= cntl( rename=( &amp;amp;bygrp= start p90= label )); run; &lt;BR /&gt;
* split the data on the classvar-based P90 values;&lt;BR /&gt;
data reduced unwanted;&lt;BR /&gt;
   set &amp;amp;data ;&lt;BR /&gt;
   if &amp;amp;filtr &amp;gt;= input( &amp;amp;bygrp, p90s. ) then output unwanted;&lt;BR /&gt;
   else output reduced ;&lt;BR /&gt;
run;&lt;BR /&gt;
/* alternate solution using just a     where filter &lt;BR /&gt;
    where &amp;amp;filtr &amp;lt; input( &amp;amp;bygrp, p90s. ) ;&lt;BR /&gt;
 ********/&lt;BR /&gt;
    [/prE]&lt;BR /&gt;
&lt;BR /&gt;
peterC</description>
      <pubDate>Thu, 15 Mar 2007 17:34:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2422#M814</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2007-03-15T17:34:30Z</dc:date>
    </item>
    <item>
      <title>Re: Getting rid of outlier data</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2423#M815</link>
      <description>Hi,&lt;BR /&gt;
&lt;BR /&gt;
How could I get rid of extreme rows ,let's say the top and the bottom 2 rows, of a table, according to the fact that their are 200 variables?&lt;BR /&gt;
&lt;BR /&gt;
Thanks in advance,&lt;BR /&gt;
&lt;BR /&gt;
Anais</description>
      <pubDate>Thu, 19 Jun 2008 10:21:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Getting-rid-of-outlier-data/m-p/2423#M815</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-06-19T10:21:26Z</dc:date>
    </item>
  </channel>
</rss>

