How do you detect and remove or treat outliers (time series)?

pearson101 · Posted 01-16-2021 12:44 PM

Hello,

My data is a time series with multiple variables by districts. (District |date| Variable1)

I looked online to fine a solution to dealing with outliers. I found one saying to calculate the IQR (interquantile range Q3 minus Q1) then multiply by 1.5 and adding the amount to Q3 and substracting that from Q1 (lower limit). But I am not sure how to actually code it to produce a output data without outliers.

I also found one suggesting the following code:

proc univariate data=" " robustscale plot;
var  varname;
run;

I tried it and produces results like this among other graphs for this variable.

Quantiles (Definition 5)
Level	Quantile
100% Max	1714.8982
99%	300.1324
95%	117.2804
90%	75.9922
75% Q3	35.1522
50% Median	13.0514
25% Q1	0.0000
10%	0.0000
5%	0.0000
1%	0.0000
0% Min	0.0000

Now that I have this information, how can I code for it to remove or treat outliers in my dataset with a code? I can't seem to find that code.

Astounding · Posted 01-16-2021 12:59 PM

Many people on this board could do the programming. But you have to do the hard part. What makes a value an outlier? There are many ways to answer the question, and an answer is required before programming can begin.

The most flexible way I have used defines outliers as more than X standard deviations above the mean, or less than X standard deviations below the mean. "X" can actually be flexible and can be a parameter fed to a macro. But there are many other plausible definitions and it is up to you to pick one if you want help with the programming.

pearson101 · Posted 01-16-2021 01:13 PM

If I want to define outlier as more or less than 3 standard deviation from the mean. Could you help me with the programming for this definition?

Astounding · Posted 01-16-2021 05:19 PM

Here is what I hope is working code (it's untested). It assumes X is the name of the variable you want to cap, and HAVE is the name of the data set that contains X.

proc summary data=have;
   var x;
   output out=stats (keep=mean std) mean=mean std=std;
run;

data want;
   if _n_=1 then do;
      set stats;
      upper_limit = mean + 3*std;
      lower_limit = mean - 3*std;
      retain upper_limit lower_limit;
   end;
   set have;
   if x > upper_limit then capped_x = upper_limit;
   else if . < x < lower_limit then capped_x = lower_limit;
   else capped_x = x;
run;

This will at least give you something to look at and consider. If you want to expand this to process many variables, there is a lot of work to be done. There is one variable MEAN and one variable STD. With many variables, you need many names to hold these statistics.

Ksharp · Posted 01-17-2021 07:28 AM

You can't apply this outlier detecting way on a TIME SERIES data.

You need PROC ARIMMA .Check its documentation and its Example 8.7: Iterative Outlier Detection :

/*-- Outlier Detection --*/
proc arima data=airline;
identify var=logair( 1, 12 ) noprint;
estimate q= (1)(12) noint method= ml;
outlier maxnum=3 alpha=0.01;
run;

SurajSaini · Posted 01-20-2021 07:21 AM

Here is SAS Macro that I created for outlier detection as per my requirements, more information visit here: https://seleritysas.com/blog/2020/12/10/sas-custom-macros-that-make-feature-engineering-easy-for-dat...
--------------------------------------- Macro Definition----------------------------------------------------------
%macro outliers(dat,var);
options nonotes;
proc univariate data=&dat normal noprint;
var &var;
output out=ttest normaltest=Test probn=P_Value;
run;
Data _Null_;
set ttest;
%if P_value > 0.05 %Then %do;
option notes;
%put NOTE: &var is normally distributed hence it select STD method to find Outliers.;
%put NOTE: You can check statistics and pvalue in work.ttest table;
options nonotes;
Proc SQL noprint;
Select Mean(&var)
into: me
from &dat;
select std(&var)
into:sd
from &dat;
quit;
run;
Data outlier;
set &dat;
%Let Min_cutoff= %sysevalf(&me - (3* &sd));
%Let Max_cutoff= %sysevalf(&me + (3* &sd));
where &var < &Min_cutoff or &var > &Max_cutoff;
run;
%end;
%else %do;
options notes;
%put NOTE: &var is not normally distributed hence it select percentile method to find Outliers.;
%put NOTE: You can check statistics and pvalue in work.ttest table & percentile values in work.ranges table;
options nonotes;
proc means data=&dat stackods n qrange p1 p99 ;
var &var;
ods output summary=ranges;
run;
proc sql noprint;
select P1 into:Min
from ranges;
select P99 into : Max
from Ranges;
quit;
run;
Data outliers;
set &dat;
Where &var < &Min or &var > &Max;
run;
%end;
options notes;
%mend;

------------------------------------------- Macro Testing ---------------------------------------------------------
options nomprint nomlogic nosymbolgen;
%outliers(Lib.dataset_name, Variable_Name)

How do you detect and remove or treat outliers (time series)?

Re: How do you detect and remove or treat outliers (time series)?

Re: How do you detect and remove or treat outliers (time series)?

Re: How do you detect and remove or treat outliers (time series)?

Re: How do you detect and remove or treat outliers (time series)?

Re: How do you detect and remove or treat outliers (time series)?

How do you detect and remove or treat outliers (time series)?

Re: How do you detect and remove or treat outliers (time series)?

Re: How do you detect and remove or treat outliers (time series)?

Re: How do you detect and remove or treat outliers (time series)?

Re: How do you detect and remove or treat outliers (time series)?

Re: How do you detect and remove or treat outliers (time series)?

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!