BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

Hi, if I have data that looks like below , and I want to find the first row (by date) that has a criteria in another field and then apply a formula.  So in the below example where the first Flag=Y starts, use the previous 3 Prices MEAN and the NEW_DIFF calculate as normal by itself  .. and where the FLAG NE Y use the noraml Price and the Diff   ..

As you can see I'm trying to smooth out the 3 outliers identified in the Flag=Y.

here's what i mean

HAVE:

Date        Flag        Price               Diff

Dec'99           N                $21                   .

Jan'00        N             $23             9%

Feb'00        N             $28            21%

Mar'00        N             $26           -7%

Apr'00        Y             $256               884%

May'00       Y             $274                7%

Jun'00        Y             $300                 9%

Jul'00         N              $29            -90%

Aug'00       N              $48             66%

WANT1:

Date        Flag        Price             Diff                    New_Price            New_Diff

Dec'99           N                $21                .                               $21                          .

Jan'00        N             $23             9%                     $23                   9%

Feb'00        N             $28            21%                    $28                   21%

Mar'00        N             $26            -7%                     $26                  -7%

Apr'00        Y             $256           884%                   $25.6                 -1.5%

May'00       Y             $274             7%                    $26.5                  3.5%

Jun'00        Y             $300            9%                     $26                    -1.8%

Jul'00         N              $29            -90%                   $29                    -90%

Aug'00       N              $48             66%                    $48                     66%

Udo had a great recommendation to use proc esm, and that works perfectly if there is only 1 outlier, but I have 3. And I thought using the mean of the previous 3 inputs can fix the outliers.

10 REPLIES 10
art297
Opal | Level 21

If your example reflects your actual data another possibility is that the 3 numbers were simply entered without a decimal place, i.e., that they should have been 25.6, 27.4 and 30.0.

If you try to take the mean of the 3 previous values, and you actually have a monotonic trend, you will falsely be showing the trend to now be non-monotonic.

Plus, if you change the three values, I would think that you would also want to change the diff values for the entry that follows the changes.

podarum
Quartz | Level 8

Hi Art,

Those are samples I just wrote down, but they are true outliers more like from $35 to $555 and they happened because there is a rolling 6-month average with low numbers for those months (perhaps no values or 1 or 2 to count from)..

The data is large (over 15 years in some cases) and all I need is to get rid of the spike from the outliers.. It's Ok to be non-monotonic for those 3 months as long as the values where Flag-N get picked up from original.. In the long run the data has an upward trend, but can and is mostly non-monotonic.. Having the Diff is not important either, If through the 3 mean then the new diff changes also acordingly..  Like I said, i would only want the previous 3-month trend to be applied to only the Flag=Y ..

art297
Opal | Level 21

One brute force approach might be:

data have;

  input Date $  Flag $  Price dollar8.  Diff percent5.;

  cards;

Dec'99   N   $21     .

Jan'00   N   $23     9%

Feb'00   N   $28    21%

Mar'00   N   $26    -7%

Apr'00   Y   $256  884%

May'00   Y   $274    7%

Jun'00   Y   $300    9%

Jul'00   N   $29   -90%

Aug'00   N   $48    66%

;

data want (drop=prices:);

  set have;

  array prices(3);

  retain prices:;

  format new_diff percent8.1;

  if flag eq 'Y' then price=mean(of prices(*));

  prices(1)=prices(2);

  prices(2)=prices(3);

  prices(3)=price;

  new_diff=prices(3)/prices(2)-1;

run;

podarum
Quartz | Level 8

Thanks Art,  this looks good.  But I do have one question, does it only apply if there are 3 outliers as per my example?  If in some cases I have 5 or 10 outliers (depending on the BY statement).. I do have Stores and Products series that I have to consider...  I guess I can always extend my Mean..

art297
Opal | Level 21

The code isn't limited to any number of outliers, only that they are preceded by at least three values.  You could easily initialize the array by including a by statements (e.g., by store product) and including an:

  if first.product then

statement to initialize the array for each store/product combination.  As long as you initialize the array with missing values, then the code would only require that at least one non-missing value preceded any flagged record.

podarum
Quartz | Level 8

Thanks Art, it works exactly how I envision it.. perfect.. Is there a way I can also see the original price...?  

art297
Opal | Level 21

replace:

  if flag eq 'Y' then price=mean(of prices(*));

  prices(1)=prices(2);

  prices(2)=prices(3);

  prices(3)=price;

with

  if flag eq 'Y' then new_price=mean(of prices(*));

   else new_price=price;

  prices(1)=prices(2);

  prices(2)=prices(3);

  prices(3)=new_price;

podarum
Quartz | Level 8

Thank you, Sir..

podarum
Quartz | Level 8

Hi Art, do I initialize the array with missing values.. ?  using retain ?

What I'm trying to do is.. run the data step through a set of if statements, but I want it by a BY group.

Eg.

Data Want;  set have;

by Store Product;

if lag(Price) > 35 then Score = 'Bad'; else Score = 'Good';

run;

I want this done by each Store and Product based on a time series, I want to look at the Score in previous month, but can't seem to separete the data by Store and Porducts.

art297
Opal | Level 21

I like to initialize arrays with the call missing statement.  e.g., if I have an array called rotation, I would use:

  call missing(of rotation(*));

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 829 views
  • 4 likes
  • 2 in conversation