## Help using Base SAS procedures

Solved
Occasional Contributor
Posts: 15

Hello all,

I am quite new to SAS programming therefore would appreciate any input on the possible solution of the problem I have!

I have the following data

value1 value2 value3 average1 required_result

1              9         12           7               7

8              6         9             6               7

4              7        10            2               7

6              2          1            8               7

.               .           .              .

.               .           .              .

.               .           .              .

7              9         16           24              24

.               .           .              .                24

X              Y          Z            W

I need to do the following:

- If row n has got the following trend - value1 > value2> value3, then use the first encountered average1 value for the next 4 rows;

- after 4 rows calculate if value1 > value2 > value3 is true. If it is true then use the next first encountered average1 value for another 4 rows;

- repeat.

for example, in the first row value1 > value2> value3 (1 > 9 > 12) therefore we need to use average1 = 7 the next 4 rows starting from the first one.

thank you so much for all your help!

Accepted Solutions
Solution
‎05-21-2015 08:58 AM
Super User
Posts: 6,774

OK, it's looking a little clearer.  The OR condition should look at SAVED_AVERAGE, rather than REQUIRED_RESULT.  SAVED_AVERAGE is retained, but REQUIRED_RESULT begins each observation as missing:

If ((value1 < value2 < value3) or (value1 < saved_average and value2 < saved_average and value3 < saved_average)) and (_n_ > 2) and (_n_ > max_recordnum) then do;

All Replies
Super User
Posts: 10,778

I am confused. Why 1 > 9 > 12   ?

data have;

input value1 value2 value3 average1 ;

cards;

1              9         12           7

8              6         9             6

4              7        10            2

6              2          1            8

7              9         16           24

7              9         16           24

7              9         16           24

7              9         16           24

;

run;

data have;

set have;

if mod(_n_,4)=1 then n+1;

run;

data want;

set have;

by n;

retain flag;

if first.n then do;

flag=.;

if value1 < value2 and value2 < value3 then flag=average1;

end;

run;

Xia Keshan

Occasional Contributor
Posts: 15

Apologies, Xia, quick typing!

I obviously meant value1 < value2 < value3

is your solution still going to be ok for this logic?

I will try it out and let you know.

thank you!

Super User
Posts: 6,774

I think you'll need something a little more flexible.  Here's a mildly similar approach:

data want;

set have;

if (value1 < value2 < value3) and (saved_average=.) then do;

max_recordnum = _n_ + 3;

saved_average = average1;

end;

retain max_recordnum saved_average;

if _n_ <= max_recordnum then required_result = saved_average;

if _n_ = max_recordnum then saved_average = .;

drop saved_average max_recordnum;

run;

Occasional Contributor
Posts: 15

Hello Xia and all,

i think I did not explain my problem quite well before, sorry!

I have attached now the picture of the excel file with the sample data (I have only my iPad with me therefore not able to do anything better).

ideally what I need is the following:

- I need to work out the required_result column using the following logic:

- for row 2 and 3 select average1 value (without using any other logic). Therefore required_result = 2 for row 2 and required_result = 3 for row 3;

- for all other rows use the following logic: if value1 < value2 < value3 then use the first encountered value of the average1 for at least 4 rows (7 in this case)

- continue using 7 after 4 rows if the following is FALSE - value1 < value2 < value3;

- if after 4 rows above the following is TRUE - value1 < value2 < value3, then use next first encountered average1 value (24 in this case) for at least 4 rows and then repeat the logic.

thank you!

Super User
Posts: 6,774

That's actually a simpler program.  One way:

data want;

set have;

retain saved_average;

if (value1 < value2 < value3) then saved_average = average1;

is saved_average > . then required_result = saved_average;

else required_result = average1;

drop saved_average;

run;

Good luck.

Occasional Contributor
Posts: 15

Thanks a lot for trying, Astounding!

however, in my example I have numbers instead of "." - I just did not want to write it all up, now I have and attachend a new file to explain.

- Green area - first two rows should always have required_result = average1 (without any other logic involved);

- Orange area, that's where we start comparing value1< value2 < value3 and if it is true, we need to use the first encountered average1 and use it for the next 4 rows regardless whether condition (value1< value2 < value3) is true or not (That's why it is 7 for all 4 rows).

- White area - we continue to use previous average1 value if condition (value1< value2 < value3) is FALSE (that's why it is 7 for rows 8, 9, 10).

- Blue area - if condition (value1< value2 < value3) is true then choose the first encountered average1 and use it for the next 4 rows regardless whether condition (value1< value2 < value3) is true or not (that's why it is 24 for the next 4 rows.)

- White area - we continue to use previous average1 value if condition (value1 < value2 < value3) is false (that's why it's 24 for another row. i have ended file here however I have a lot more entries down as well).

hope it makes sense.

thank you!

Super User
Posts: 6,774

Seems to make sense ... I'll go back to a variation of the first program:

data want;

set have;

if (value1 < value2 < value3) and (_n_ > 2) and (_n_ > max_recordnum) then do;

max_recordnum = _n_ + 3;

saved_average = average1;

end;

retain max_recordnum saved_average;

if saved_average = . then required_result = average1;

else required_result = saved_average;

drop saved_average max_recordnum;

run;

Occasional Contributor
Posts: 15

thank you so much, you are a genius, Astounding!

That's great for what I explained so far, however what if we had additional condition:

- In addition to what we had before, we also have another condition to check  (value1 < required_result and value2 < required_result and value3 < required result). If that's true or previous condition  (value1 < value2 < value3) is true then required result takes the value of the first encountered average1.

As a result, you can see that the RED block has got now the required_result = 5 repeated for 4 rows and then White block immediately after is still using Required_result = 5 as both of our conditions do not hold true here (value1 < required_result and value2 < required_result and value3 < required result) is FALSE and (value1 < value 2 < value3) is also FALSE.

The challenge here is that 2nd condition is using the newly created variable required_result.

It would be great if you can help!

Thank you!

Super User
Posts: 6,774

Red,

I'm not sure how much more time I can devote, but let's start at least.

Why does line 15 remain as is, instead of beginning a new set of 4 with required_result=6?  It appears that the conditions are in place:  value1 < 24 and value2 < 24 and value3 < 24.

I don't see it.  Why does line 16 should trigger a new block of 4, but line 15 does not?

Occasional Contributor
Posts: 15

Thank you for your time, Astounding, I really appreciate it!

I am sorry, I made a silly error typing numbers in on my iPad - in line 15 it should be 27 instead of 7 (please see updated numbers attached).

therefore both conditions do not hold for line 15.

I would be so grateful if you can help here!

Occasional Contributor
Posts: 15

Hi all,

I tried to develop on Astounding's code and make sure that I will include the comparison with the newly extended average, however it does not work the way it should be.

can you please advise what needs to be done so that condition (value1 < saved_average and value2 < saved_average and value3 < saved_average) always tests against not saved_average but the newly created required_result variable?

Data want;

Set have;

If (value1 < value2 < value3) and (_n_ > 2) and (_n_ > max_recordnum) then do;

max_recordnum = _n_ + 3;

Saved_average = average1;

End;

Retain max_recordnum saved_average;

If saved_average = . Then required_result = average1;

Else required_result = saved_average;

If (value1 < saved_average and value2 < saved_average and value3 < saved_average) and (_n_ > max_recordnums) and (_n_ > max_recordnum) then do;

max_recordnums = _n_ + 3;

Saved_average2 = average1;

End;

Retain max_recordnums saved_average2;

If saved_average2 = .  And  saved_average = . Then required_result = average1;

Else required_result = saved_average;

If saved_average2 <> .  Then required_result = saved_average2;

Run;

Super User
Posts: 6,774

Red,

Here's the statement from the original that determines whether to begin another block of 4 records:

if (value1 < value2 < value3) and (_n_ > 2) and (_n_ > max_recordnum) then do;

It seems you would just need to add another condition and alter just this one statement:

if ( (value1 < value2 < value3) or (value1 < required_result and value2 < required_result and value3 < required_result) )

and (_n_ > 2) and (_n_ > max_recordnum) then do;

Does that seem about right to you?

Occasional Contributor
Posts: 15

Thanks, Astounding.

yes, I have already tried and the additional condition does not work.

The code results stay unchanged as if there was not any additional condition.

Super User
Posts: 6,774

Are you sure you used OR in the right spot:

if ( (value1 < value2 < value3) or (value1 < required_result and value2 < required_result and value3 < required_result) )

and (_n_ > 2) and (_n_ > max_recordnum) then do;

It might help if you showed the exact code you tested.

🔒 This topic is solved and locked.