## using a first.date within groupings

Super Contributor
Posts: 409

# using a first.date within groupings

Hi, if I have data that looks like below , and I want to find the first row (by date) that has a criteria in another field and then apply a formula.  So in the below example where the first Flag=Y starts, use the previous 3 Prices MEAN and the NEW_DIFF calculate as normal by itself  .. and where the FLAG NE Y use the noraml Price and the Diff   ..

As you can see I'm trying to smooth out the 3 outliers identified in the Flag=Y.

here's what i mean

HAVE:

Date        Flag        Price               Diff

Dec'99           N                \$21                   .

Jan'00        N             \$23             9%

Feb'00        N             \$28            21%

Mar'00        N             \$26           -7%

Apr'00        Y             \$256               884%

May'00       Y             \$274                7%

Jun'00        Y             \$300                 9%

Jul'00         N              \$29            -90%

Aug'00       N              \$48             66%

WANT1:

Date        Flag        Price             Diff                    New_Price            New_Diff

Dec'99           N                \$21                .                               \$21                          .

Jan'00        N             \$23             9%                     \$23                   9%

Feb'00        N             \$28            21%                    \$28                   21%

Mar'00        N             \$26            -7%                     \$26                  -7%

Apr'00        Y             \$256           884%                   \$25.6                 -1.5%

May'00       Y             \$274             7%                    \$26.5                  3.5%

Jun'00        Y             \$300            9%                     \$26                    -1.8%

Jul'00         N              \$29            -90%                   \$29                    -90%

Aug'00       N              \$48             66%                    \$48                     66%

Udo had a great recommendation to use proc esm, and that works perfectly if there is only 1 outlier, but I have 3. And I thought using the mean of the previous 3 inputs can fix the outliers.

PROC Star
Posts: 8,167

## Re: using a first.date within groupings

If your example reflects your actual data another possibility is that the 3 numbers were simply entered without a decimal place, i.e., that they should have been 25.6, 27.4 and 30.0.

If you try to take the mean of the 3 previous values, and you actually have a monotonic trend, you will falsely be showing the trend to now be non-monotonic.

Plus, if you change the three values, I would think that you would also want to change the diff values for the entry that follows the changes.

Super Contributor
Posts: 409

## Re: using a first.date within groupings

Hi Art,

Those are samples I just wrote down, but they are true outliers more like from \$35 to \$555 and they happened because there is a rolling 6-month average with low numbers for those months (perhaps no values or 1 or 2 to count from)..

The data is large (over 15 years in some cases) and all I need is to get rid of the spike from the outliers.. It's Ok to be non-monotonic for those 3 months as long as the values where Flag-N get picked up from original.. In the long run the data has an upward trend, but can and is mostly non-monotonic.. Having the Diff is not important either, If through the 3 mean then the new diff changes also acordingly..  Like I said, i would only want the previous 3-month trend to be applied to only the Flag=Y ..

PROC Star
Posts: 8,167

## Re: using a first.date within groupings

One brute force approach might be:

data have;

input Date \$  Flag \$  Price dollar8.  Diff percent5.;

cards;

Dec'99   N   \$21     .

Jan'00   N   \$23     9%

Feb'00   N   \$28    21%

Mar'00   N   \$26    -7%

Apr'00   Y   \$256  884%

May'00   Y   \$274    7%

Jun'00   Y   \$300    9%

Jul'00   N   \$29   -90%

Aug'00   N   \$48    66%

;

data want (drop=prices;

set have;

array prices(3);

retain prices:;

format new_diff percent8.1;

if flag eq 'Y' then price=mean(of prices(*));

prices(1)=prices(2);

prices(2)=prices(3);

prices(3)=price;

new_diff=prices(3)/prices(2)-1;

run;

Super Contributor
Posts: 409

## Re: using a first.date within groupings

Thanks Art,  this looks good.  But I do have one question, does it only apply if there are 3 outliers as per my example?  If in some cases I have 5 or 10 outliers (depending on the BY statement).. I do have Stores and Products series that I have to consider...  I guess I can always extend my Mean..

PROC Star
Posts: 8,167

## Re: using a first.date within groupings

The code isn't limited to any number of outliers, only that they are preceded by at least three values.  You could easily initialize the array by including a by statements (e.g., by store product) and including an:

if first.product then

statement to initialize the array for each store/product combination.  As long as you initialize the array with missing values, then the code would only require that at least one non-missing value preceded any flagged record.

Super Contributor
Posts: 409

## Re: using a first.date within groupings

Thanks Art, it works exactly how I envision it.. perfect.. Is there a way I can also see the original price...?

PROC Star
Posts: 8,167

## Re: using a first.date within groupings

replace:

if flag eq 'Y' then price=mean(of prices(*));

prices(1)=prices(2);

prices(2)=prices(3);

prices(3)=price;

with

if flag eq 'Y' then new_price=mean(of prices(*));

else new_price=price;

prices(1)=prices(2);

prices(2)=prices(3);

prices(3)=new_price;

Super Contributor
Posts: 409

## Re: using a first.date within groupings

Thank you, Sir..

Super Contributor
Posts: 409

## Re: using a first.date within groupings

Hi Art, do I initialize the array with missing values.. ?  using retain ?

What I'm trying to do is.. run the data step through a set of if statements, but I want it by a BY group.

Eg.

Data Want;  set have;

by Store Product;

if lag(Price) > 35 then Score = 'Bad'; else Score = 'Good';

run;

I want this done by each Store and Product based on a time series, I want to look at the Score in previous month, but can't seem to separete the data by Store and Porducts.

PROC Star
Posts: 8,167

## Re: using a first.date within groupings

I like to initialize arrays with the call missing statement.  e.g., if I have an array called rotation, I would use:

call missing(of rotation(*));

Discussion stats
• 10 replies
• 219 views
• 4 likes
• 2 in conversation