<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How do you detect and remove or treat outliers (time series)? in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711861#M80009</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;My data is a time series with multiple variables by districts. (District |date| Variable1)&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; I&amp;nbsp; looked online to fine a solution to dealing with outliers. I found one saying to c&lt;SPAN&gt;alculate the IQR (interquantile range Q3 minus Q1) then multiply by 1.5 and adding the amount to Q3 and substracting that from Q1 (lower limit). But I am not sure how to actually code it to produce a output data without outliers.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I also found one suggesting the following&amp;nbsp;code:&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE class="language-sas"&gt;&lt;CODE&gt;proc univariate data=" " robustscale plot;
var  varname;
run; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried it and produces results like this among other graphs for this variable.&lt;/P&gt;&lt;DIV class="branch"&gt;&lt;DIV align="center"&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;Quantiles&amp;nbsp;(Definition&amp;nbsp;5)&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;Level&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;Quantile&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;100% Max&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1714.8982&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;99%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;300.1324&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;95%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;117.2804&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;90%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;75.9922&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;75% Q3&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;35.1522&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;50% Median&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;13.0514&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;25% Q1&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0.0000&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;10%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0.0000&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;5%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0.0000&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;1%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0.0000&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;0% Min&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0.0000&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;BR /&gt;&lt;P class="lia-align-left"&gt;Now that I have this information, how can I code for it to remove or treat outliers in my dataset with a code? I can't seem to find that code.&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Sat, 16 Jan 2021 17:44:28 GMT</pubDate>
    <dc:creator>pearson101</dc:creator>
    <dc:date>2021-01-16T17:44:28Z</dc:date>
    <item>
      <title>How do you detect and remove or treat outliers (time series)?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711861#M80009</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;My data is a time series with multiple variables by districts. (District |date| Variable1)&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; I&amp;nbsp; looked online to fine a solution to dealing with outliers. I found one saying to c&lt;SPAN&gt;alculate the IQR (interquantile range Q3 minus Q1) then multiply by 1.5 and adding the amount to Q3 and substracting that from Q1 (lower limit). But I am not sure how to actually code it to produce a output data without outliers.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I also found one suggesting the following&amp;nbsp;code:&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE class="language-sas"&gt;&lt;CODE&gt;proc univariate data=" " robustscale plot;
var  varname;
run; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried it and produces results like this among other graphs for this variable.&lt;/P&gt;&lt;DIV class="branch"&gt;&lt;DIV align="center"&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;Quantiles&amp;nbsp;(Definition&amp;nbsp;5)&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;Level&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;Quantile&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;100% Max&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1714.8982&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;99%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;300.1324&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;95%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;117.2804&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;90%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;75.9922&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;75% Q3&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;35.1522&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;50% Median&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;13.0514&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;25% Q1&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0.0000&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;10%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0.0000&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;5%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0.0000&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;1%&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0.0000&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;0% Min&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0.0000&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;BR /&gt;&lt;P class="lia-align-left"&gt;Now that I have this information, how can I code for it to remove or treat outliers in my dataset with a code? I can't seem to find that code.&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Sat, 16 Jan 2021 17:44:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711861#M80009</guid>
      <dc:creator>pearson101</dc:creator>
      <dc:date>2021-01-16T17:44:28Z</dc:date>
    </item>
    <item>
      <title>Re: How do you detect and remove or treat outliers (time series)?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711862#M80010</link>
      <description>&lt;P&gt;Many people on this board could do the programming.&amp;nbsp; But you have to do the hard part.&amp;nbsp; What makes a value an outlier?&amp;nbsp; There are many ways to answer the question, and an answer is required before programming can begin.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The most flexible way I have used defines outliers as more than X standard deviations above the mean, or less than X standard deviations below the mean.&amp;nbsp; "X" can actually be flexible and can be a parameter fed to a macro.&amp;nbsp; But there are many other plausible definitions and it is up to you to pick one if you want help with the programming.&lt;/P&gt;</description>
      <pubDate>Sat, 16 Jan 2021 17:59:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711862#M80010</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2021-01-16T17:59:54Z</dc:date>
    </item>
    <item>
      <title>Re: How do you detect and remove or treat outliers (time series)?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711864#M80012</link>
      <description>&lt;P&gt;If I want to define outlier as more or less than 3 standard deviation from the mean. Could you help me with the programming for this definition?&lt;/P&gt;</description>
      <pubDate>Sat, 16 Jan 2021 18:13:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711864#M80012</guid>
      <dc:creator>pearson101</dc:creator>
      <dc:date>2021-01-16T18:13:40Z</dc:date>
    </item>
    <item>
      <title>Re: How do you detect and remove or treat outliers (time series)?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711893#M80015</link>
      <description>&lt;P&gt;Here is what I hope is working code (it's untested).&amp;nbsp; It assumes X is the name of the variable you want to cap, and HAVE is the name of the data set that contains X.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc summary data=have;
   var x;
   output out=stats (keep=mean std) mean=mean std=std;
run;

data want;
   if _n_=1 then do;
      set stats;
      upper_limit = mean + 3*std;
      lower_limit = mean - 3*std;
      retain upper_limit lower_limit;
   end;
   set have;
   if x &amp;gt; upper_limit then capped_x = upper_limit;
   else if . &amp;lt; x &amp;lt; lower_limit then capped_x = lower_limit;
   else capped_x = x;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This will at least give you something to look at and consider.&amp;nbsp; If you want to expand this to process many variables, there is a lot of work to be done.&amp;nbsp; There is one variable MEAN and one variable STD.&amp;nbsp; With many variables, you need many names to hold these statistics.&lt;/P&gt;</description>
      <pubDate>Sat, 16 Jan 2021 22:20:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711893#M80015</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2021-01-16T22:20:27Z</dc:date>
    </item>
    <item>
      <title>Re: How do you detect and remove or treat outliers (time series)?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711944#M80016</link>
      <description>&lt;P&gt;You can't apply this outlier detecting way on a TIME SERIES data.&lt;/P&gt;
&lt;P&gt;You need PROC ARIMMA .Check its documentation and its Example 8.7: Iterative Outlier Detection :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/*-- Outlier Detection --*/
proc arima data=airline;
identify var=logair( 1, 12 ) noprint;
estimate q= (1)(12) noint method= ml;
outlier maxnum=3 alpha=0.01;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 17 Jan 2021 12:30:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/711944#M80016</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2021-01-17T12:30:25Z</dc:date>
    </item>
    <item>
      <title>Re: How do you detect and remove or treat outliers (time series)?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/712643#M80031</link>
      <description>Here is SAS Macro that I created for outlier detection as per my requirements, more information visit here: &lt;A href="https://seleritysas.com/blog/2020/12/10/sas-custom-macros-that-make-feature-engineering-easy-for-data-scientists-data-engineers-and-machine-learning-specialists/" target="_blank"&gt;https://seleritysas.com/blog/2020/12/10/sas-custom-macros-that-make-feature-engineering-easy-for-data-scientists-data-engineers-and-machine-learning-specialists/&lt;/A&gt; &lt;BR /&gt;--------------------------------------- Macro Definition----------------------------------------------------------&lt;BR /&gt;%macro outliers(dat,var);&lt;BR /&gt;options nonotes;&lt;BR /&gt;proc univariate data=&amp;amp;dat normal noprint;&lt;BR /&gt;var &amp;amp;var;&lt;BR /&gt;output out=ttest normaltest=Test probn=P_Value;&lt;BR /&gt;run;&lt;BR /&gt;Data _Null_;&lt;BR /&gt;set ttest;&lt;BR /&gt;%if P_value &amp;gt; 0.05 %Then %do;&lt;BR /&gt;  option notes;&lt;BR /&gt;  %put NOTE: &amp;amp;var is normally distributed hence it select STD method to find Outliers.;&lt;BR /&gt;  %put NOTE: You can check statistics and pvalue in work.ttest table;&lt;BR /&gt;  options nonotes;&lt;BR /&gt;  Proc SQL noprint;&lt;BR /&gt;  Select Mean(&amp;amp;var)&lt;BR /&gt;    into: me&lt;BR /&gt;  from &amp;amp;dat;&lt;BR /&gt;  select std(&amp;amp;var)&lt;BR /&gt;    into:sd&lt;BR /&gt;  from &amp;amp;dat;&lt;BR /&gt;  quit;&lt;BR /&gt;  run;&lt;BR /&gt;  Data outlier;&lt;BR /&gt;  set &amp;amp;dat;&lt;BR /&gt;  %Let Min_cutoff= %sysevalf(&amp;amp;me - (3* &amp;amp;sd));&lt;BR /&gt;  %Let Max_cutoff= %sysevalf(&amp;amp;me + (3* &amp;amp;sd));&lt;BR /&gt;  where &amp;amp;var &amp;lt; &amp;amp;Min_cutoff or &amp;amp;var &amp;gt; &amp;amp;Max_cutoff;&lt;BR /&gt;  run;&lt;BR /&gt;%end;&lt;BR /&gt;%else %do;&lt;BR /&gt;  options notes;&lt;BR /&gt;  %put NOTE: &amp;amp;var is not normally distributed hence it select percentile method to find Outliers.;&lt;BR /&gt;  %put NOTE: You can check statistics and pvalue in work.ttest table &amp;amp; percentile values in work.ranges table;&lt;BR /&gt;  options nonotes;&lt;BR /&gt;  proc means data=&amp;amp;dat stackods n qrange p1 p99 ;&lt;BR /&gt;  var  &amp;amp;var;&lt;BR /&gt;  ods output summary=ranges;&lt;BR /&gt;  run;&lt;BR /&gt;  proc sql noprint;&lt;BR /&gt;  select P1 into:Min &lt;BR /&gt;  from ranges;&lt;BR /&gt;  select P99 into : Max&lt;BR /&gt;  from Ranges;&lt;BR /&gt;  quit;&lt;BR /&gt;  run;&lt;BR /&gt;  Data outliers;&lt;BR /&gt;  set &amp;amp;dat;&lt;BR /&gt;  Where &amp;amp;var &amp;lt; &amp;amp;Min or &amp;amp;var &amp;gt; &amp;amp;Max;&lt;BR /&gt;  run;&lt;BR /&gt;%end;&lt;BR /&gt;  options notes;&lt;BR /&gt;%mend;&lt;BR /&gt;&lt;BR /&gt;------------------------------------------- Macro Testing ---------------------------------------------------------&lt;BR /&gt;options nomprint nomlogic nosymbolgen;&lt;BR /&gt;%outliers(Lib.dataset_name, Variable_Name)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 20 Jan 2021 12:21:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-do-you-detect-and-remove-or-treat-outliers-time-series/m-p/712643#M80031</guid>
      <dc:creator>SurajSaini</dc:creator>
      <dc:date>2021-01-20T12:21:51Z</dc:date>
    </item>
  </channel>
</rss>

