<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Hampel identifier for outlier detection in SAS/IML Software and Matrix Computations</title>
    <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961773#M6486</link>
    <description>&lt;P&gt;Yes. I was suggesting that you to try to implement the method yourself. You can use the following function to return the rolling median and the Hampel bands for each series:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;start HampelPred(y, k, multiplier=3);
   M = extractWindows(y, k, "REPEAT");
   med = T( median(M) );
   Nmad = T( mad(M, "NMAD") ); 
   pred = med;
   lower = pred - multiplier*Nmad;
   upper = pred + multiplier*Nmad;
   return pred || lower || upper;
finish;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Now call that function inside the loop over the BY groups and include the results in the output data set.&lt;/P&gt;</description>
    <pubDate>Thu, 13 Mar 2025 10:36:39 GMT</pubDate>
    <dc:creator>Rick_SAS</dc:creator>
    <dc:date>2025-03-13T10:36:39Z</dc:date>
    <item>
      <title>Hampel identifier for outlier detection</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961584#M6479</link>
      <description>&lt;P&gt;Hello community,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would like to apply outlier detection using the hampel identifier, I've found this article&lt;/P&gt;
&lt;P&gt;&lt;A href="https://blogs.sas.com/content/iml/2021/06/01/hampel-filter-robust-outliers.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2021/06/01/hampel-filter-robust-outliers.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;but I want to apply that to my series.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a sample dataset for example like that&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data DIFF_Q_NUM;
    format date_gas yymmdd10.; /* Formatta la data */
    do date_gas = '01JAN2023'd to '31DEC2023'd; /* Loop sulle date per un anno */
        do cod_reg_clim = 1 to 5; /* Cinque regioni climatiche */
            do i = 1 to 3; /* Tre zone climatiche per ogni regione */
                zona_clim = byte(64 + i); /* Converte 1-&amp;gt;A, 2-&amp;gt;B, 3-&amp;gt;C */
                diff = rand("Normal", 0, 10); /* Valore casuale con media 0 e deviazione 10 */
                output;
            end;
        end;
    end;
run;


&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My y target is variable 'diff', I would like to find the outliers for each combination of&amp;nbsp;&lt;CODE class=" language-sas"&gt;cod_reg_clim and zona_clim  &lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;and thick them in my dataset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you in advance&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Mar 2025 16:24:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961584#M6479</guid>
      <dc:creator>harmonic</dc:creator>
      <dc:date>2025-03-11T16:24:53Z</dc:date>
    </item>
    <item>
      <title>Re: Hampel identifier for outlier detection</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961683#M6482</link>
      <description>&lt;P&gt;It sounds like you want to perform the same analysis for each unique combination of subset of&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="  language-sas"&gt;cod_reg_clim&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="token keyword"&gt;and&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;zona_clim&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To do this, I suggest sorting the data by the groups. I also find it useful to add an index variable that enumerates the unique combinations:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data DIFF_Q_NUM;
    array clim[3] $1 _temporary_ ('A' 'B' 'C');
    format date_gas yymmdd10.; /* Formatta la data */
    do date_gas = '01JAN2023'd to '31DEC2023'd; /* Loop sulle date per un anno */
        do cod_reg_clim = 1 to 5; /* Cinque regioni climatiche */
            do i = 1 to 3; /* Tre zone climatiche per ogni regione */
                zona_clim = clim[i]; 
                diff = rand("Normal", 0, 10); /* Valore casuale con media 0 e deviazione 10 */
                output;
            end;
        end;
    end;
run;

/* first, sort for BY-group analysis */
proc sort data=DIFF_Q_NUM out=DIFF_Q_NUM_SORT;
by cod_reg_clim zona_clim date_gas;
run;

/* add a single ID variable for each BY group */
data DIFF_Q_NUM_SORT;
set DIFF_Q_NUM_SORT;
by cod_reg_clim zona_clim;
if first.zona_clim then 
   ID + 1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Now the data are ready for a BY-group analysis. In PROC IML, there are several ways to do this. The easiest to explain is the unique-loc method. See&amp;nbsp;&lt;A href="https://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html" target="_blank"&gt;The UNIQUE-LOC trick: A real treat! - The DO Loop&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;But also see the article&amp;nbsp;&lt;A href="https://blogs.sas.com/content/iml/2012/04/16/by-group-processing-in-sasiml.html" target="_blank"&gt;BY-group processing in SAS/IML - The DO Loop&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The following program reads all observations into IML vectors, then extracts the time series for each combination of the BY groups. It calls the HameplID function that you want to use on each time series and forms a vector that indicates which observations are outliers. It then writes that vector to a SAS data set. You can merge the outlier vector and the sorted data.&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc iml;
/* Function from https://blogs.sas.com/content/iml/2021/05/26/running-median-smoother.html
   Extract the symmetric sequence about x[i], which is x[i-k-1:i+k+1] if
   1 &amp;lt;= i-k-1  and  i+k+1 &amp;lt;= nrow(x).
   Otherwise, pad the sequence. You can pad with a user-defined value 
   (such as missing or zero) or you can repeat the first or last value.
   This function loops over the number of columns and extracts each window.
   INPUT: y is the time series
          k is the radius of the window. The window length is 2*k+1 centered at y[i]
          padVal can be missing or a character value. If character,
                 use y[1] to pad on the left and y[n] to pad on the right.
*/
start extractWindows(y, k, padVal="REPEAT");
   n = nrow(y);
   M = j(2*k+1, n, .);
   /* create new series, z, by padding k values to the ends of y */
   if type(padVal)='N' then
      z = repeat(padVal, k) // y // repeat(padVal, k); /* pad with user value */
   else
      z = repeat(y[1], k) // y // repeat(y[n], k);     /* Tukey's padding */
   do i = 1 to n;
      range = i : i+2*k;   /* centered at k+i */
      M[,i] = z[range];
   end;
   return M;
finish;
/* Hampel identifier: Use moving median and MAD to identify outliers 
   https://blogs.sas.com/content/iml/2021/06/01/hampel-filter-robust-outliers.html */
start HampelID(y, k, multiplier=3);
   M = extractWindows(y, k, "REPEAT");
   med = T( median(M) );
   Nmad = T( mad(M, "NMAD") ); 
   idx = loc( abs(y-med) &amp;gt; multiplier*Nmad );
   return idx;
finish;

/* read in the series for each BY-group combination of cod_reg_clim and zona_clim */
use DIFF_Q_NUM_SORT;
read all var {diff ID};
close;

winLen = 6;                    /* set the window length */
Outlier = j(nrow(diff), 1, 0);  /* most obs are not outliers */
firstObs = 1;
do i = 1 to max(ID);
   groupIdx = loc(ID=i);
   y = diff[groupIdx];
   outlierIdx = HampelID(y, winLen);
   if ncol(OutlierIdx) &amp;gt; 0 then    /* there are outliers for this group */
      Outlier[firstObs - 1 + outlierIdx] = 1;   /* mark the outliers */
   firstObs = firstObs + ncol(groupIdx);
end;
create HampelOutliers var "Outlier";
append;
close HampelOutliers;
QUIT;

/* merge the Hampel outlier variable */
data DIFF_Q_HAMPEL;
merge DIFF_Q_NUM_SORT HampelOutliers;
run;
/* visualize the time series and outliers */
proc sgpanel data=DIFF_Q_HAMPEL;
panelby zona_clim cod_reg_clim / layout=lattice onepanel; 
scatter x=date_gas y=diff / group=Outlier;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture.PNG" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/105371i5ABE58264D6ADD50/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Capture.PNG" alt="Capture.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;</description>
      <pubDate>Wed, 12 Mar 2025 14:50:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961683#M6482</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2025-03-12T14:50:58Z</dc:date>
    </item>
    <item>
      <title>Re: Hampel identifier for outlier detection</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961687#M6483</link>
      <description>&lt;P&gt;How can I verify that the filter works? Showing the band lower and upper to check the outliers points?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Mar 2025 15:20:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961687#M6483</guid>
      <dc:creator>harmonic</dc:creator>
      <dc:date>2025-03-12T15:20:53Z</dc:date>
    </item>
    <item>
      <title>Re: Hampel identifier for outlier detection</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961688#M6484</link>
      <description>&lt;P&gt;That is shown in the blog post:&amp;nbsp;&lt;A href="https://blogs.sas.com/content/iml/2021/06/01/hampel-filter-robust-outliers.html" target="_blank"&gt;The Hampel identifier: Robust outlier detection in a time series - The DO Loop&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Try to figure it out and write back if you get stuck.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Mar 2025 15:25:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961688#M6484</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2025-03-12T15:25:37Z</dc:date>
    </item>
    <item>
      <title>Re: Hampel identifier for outlier detection</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961693#M6485</link>
      <description>&lt;P&gt;Yes but in your code I can't see the lower and upper bound. I would like to see in my dataset the bands because I see that the extreme values are not detected as outliers.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="harmonic_0-1741795013240.png" style="width: 790px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/105372iDBD2C7EAD4306A10/image-dimensions/790x285?v=v2" width="790" height="285" role="button" title="harmonic_0-1741795013240.png" alt="harmonic_0-1741795013240.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Mar 2025 15:57:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961693#M6485</guid>
      <dc:creator>harmonic</dc:creator>
      <dc:date>2025-03-12T15:57:19Z</dc:date>
    </item>
    <item>
      <title>Re: Hampel identifier for outlier detection</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961773#M6486</link>
      <description>&lt;P&gt;Yes. I was suggesting that you to try to implement the method yourself. You can use the following function to return the rolling median and the Hampel bands for each series:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;start HampelPred(y, k, multiplier=3);
   M = extractWindows(y, k, "REPEAT");
   med = T( median(M) );
   Nmad = T( mad(M, "NMAD") ); 
   pred = med;
   lower = pred - multiplier*Nmad;
   upper = pred + multiplier*Nmad;
   return pred || lower || upper;
finish;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Now call that function inside the loop over the BY groups and include the results in the output data set.&lt;/P&gt;</description>
      <pubDate>Thu, 13 Mar 2025 10:36:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Hampel-identifier-for-outlier-detection/m-p/961773#M6486</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2025-03-13T10:36:39Z</dc:date>
    </item>
  </channel>
</rss>

