Help using Base SAS procedures

using a first.date within groupings

Reply
Super Contributor
Posts: 401

using a first.date within groupings

Hi, if I have data that looks like below , and I want to find the first row (by date) that has a criteria in another field and then apply a formula.  So in the below example where the first Flag=Y starts, use the previous 3 Prices MEAN and the NEW_DIFF calculate as normal by itself  .. and where the FLAG NE Y use the noraml Price and the Diff   ..

As you can see I'm trying to smooth out the 3 outliers identified in the Flag=Y.

here's what i mean

HAVE:

Date        Flag        Price               Diff

Dec'99           N                $21                   .

Jan'00        N             $23             9%

Feb'00        N             $28            21%

Mar'00        N             $26           -7%

Apr'00        Y             $256               884%

May'00       Y             $274                7%

Jun'00        Y             $300                 9%

Jul'00         N              $29            -90%

Aug'00       N              $48             66%

WANT1:

Date        Flag        Price             Diff                    New_Price            New_Diff

Dec'99           N                $21                .                               $21                          .

Jan'00        N             $23             9%                     $23                   9%

Feb'00        N             $28            21%                    $28                   21%

Mar'00        N             $26            -7%                     $26                  -7%

Apr'00        Y             $256           884%                   $25.6                 -1.5%

May'00       Y             $274             7%                    $26.5                  3.5%

Jun'00        Y             $300            9%                     $26                    -1.8%

Jul'00         N              $29            -90%                   $29                    -90%

Aug'00       N              $48             66%                    $48                     66%

Udo had a great recommendation to use proc esm, and that works perfectly if there is only 1 outlier, but I have 3. And I thought using the mean of the previous 3 inputs can fix the outliers.

PROC Star
Posts: 7,468

Re: using a first.date within groupings

If your example reflects your actual data another possibility is that the 3 numbers were simply entered without a decimal place, i.e., that they should have been 25.6, 27.4 and 30.0.

If you try to take the mean of the 3 previous values, and you actually have a monotonic trend, you will falsely be showing the trend to now be non-monotonic.

Plus, if you change the three values, I would think that you would also want to change the diff values for the entry that follows the changes.

Super Contributor
Posts: 401

Re: using a first.date within groupings

Hi Art,

Those are samples I just wrote down, but they are true outliers more like from $35 to $555 and they happened because there is a rolling 6-month average with low numbers for those months (perhaps no values or 1 or 2 to count from)..

The data is large (over 15 years in some cases) and all I need is to get rid of the spike from the outliers.. It's Ok to be non-monotonic for those 3 months as long as the values where Flag-N get picked up from original.. In the long run the data has an upward trend, but can and is mostly non-monotonic.. Having the Diff is not important either, If through the 3 mean then the new diff changes also acordingly..  Like I said, i would only want the previous 3-month trend to be applied to only the Flag=Y ..

PROC Star
Posts: 7,468

Re: using a first.date within groupings

One brute force approach might be:

data have;

  input Date $  Flag $  Price dollar8.  Diff percent5.;

  cards;

Dec'99   N   $21     .

Jan'00   N   $23     9%

Feb'00   N   $28    21%

Mar'00   N   $26    -7%

Apr'00   Y   $256  884%

May'00   Y   $274    7%

Jun'00   Y   $300    9%

Jul'00   N   $29   -90%

Aug'00   N   $48    66%

;

data want (drop=pricesSmiley Happy;

  set have;

  array prices(3);

  retain prices:;

  format new_diff percent8.1;

  if flag eq 'Y' then price=mean(of prices(*));

  prices(1)=prices(2);

  prices(2)=prices(3);

  prices(3)=price;

  new_diff=prices(3)/prices(2)-1;

run;

Super Contributor
Posts: 401

Re: using a first.date within groupings

Thanks Art,  this looks good.  But I do have one question, does it only apply if there are 3 outliers as per my example?  If in some cases I have 5 or 10 outliers (depending on the BY statement).. I do have Stores and Products series that I have to consider...  I guess I can always extend my Mean..

PROC Star
Posts: 7,468

Re: using a first.date within groupings

The code isn't limited to any number of outliers, only that they are preceded by at least three values.  You could easily initialize the array by including a by statements (e.g., by store product) and including an:

  if first.product then

statement to initialize the array for each store/product combination.  As long as you initialize the array with missing values, then the code would only require that at least one non-missing value preceded any flagged record.

Super Contributor
Posts: 401

Re: using a first.date within groupings

Thanks Art, it works exactly how I envision it.. perfect.. Is there a way I can also see the original price...?  

PROC Star
Posts: 7,468

Re: using a first.date within groupings

replace:

  if flag eq 'Y' then price=mean(of prices(*));

  prices(1)=prices(2);

  prices(2)=prices(3);

  prices(3)=price;

with

  if flag eq 'Y' then new_price=mean(of prices(*));

   else new_price=price;

  prices(1)=prices(2);

  prices(2)=prices(3);

  prices(3)=new_price;

Super Contributor
Posts: 401

Re: using a first.date within groupings

Thank you, Sir..

Super Contributor
Posts: 401

Re: using a first.date within groupings

Hi Art, do I initialize the array with missing values.. ?  using retain ?

What I'm trying to do is.. run the data step through a set of if statements, but I want it by a BY group.

Eg.

Data Want;  set have;

by Store Product;

if lag(Price) > 35 then Score = 'Bad'; else Score = 'Good';

run;

I want this done by each Store and Product based on a time series, I want to look at the Score in previous month, but can't seem to separete the data by Store and Porducts.

PROC Star
Posts: 7,468

Re: using a first.date within groupings

I like to initialize arrays with the call missing statement.  e.g., if I have an array called rotation, I would use:

  call missing(of rotation(*));

Ask a Question
Discussion stats
  • 10 replies
  • 203 views
  • 4 likes
  • 2 in conversation