BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Red_Squirrel
Calcite | Level 5

Hello all,

I am quite new to SAS programming therefore would appreciate any input on the possible solution of the problem I have!

I have the following data

value1 value2 value3 average1 required_result

1              9         12           7               7

8              6         9             6               7

4              7        10            2               7

6              2          1            8               7

.               .           .              .       

.               .           .              .

.               .           .              .

7              9         16           24              24

.               .           .              .                24

X              Y          Z            W

I need to do the following:

- If row n has got the following trend - value1 > value2> value3, then use the first encountered average1 value for the next 4 rows;

- after 4 rows calculate if value1 > value2 > value3 is true. If it is true then use the next first encountered average1 value for another 4 rows;

- repeat.

for example, in the first row value1 > value2> value3 (1 > 9 > 12) therefore we need to use average1 = 7 the next 4 rows starting from the first one.

can you please advise how this can be coded in SAS?

thank you so much for all your help!

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

OK, it's looking a little clearer.  The OR condition should look at SAVED_AVERAGE, rather than REQUIRED_RESULT.  SAVED_AVERAGE is retained, but REQUIRED_RESULT begins each observation as missing:

If ((value1 < value2 < value3) or (value1 < saved_average and value2 < saved_average and value3 < saved_average)) and (_n_ > 2) and (_n_ > max_recordnum) then do;

View solution in original post

17 REPLIES 17
Ksharp
Super User

I am confused. Why 1 > 9 > 12   ?

data have;

input value1 value2 value3 average1 ;

cards;

1              9         12           7              

8              6         9             6              

4              7        10            2              

6              2          1            8              

7              9         16           24            

7              9         16           24            

7              9         16           24             

7              9         16           24           

;

run;

data have;

set have;

if mod(_n_,4)=1 then n+1;

run;

data want;

set have;

by n;

retain flag;

if first.n then do;

  flag=.;

  if value1 < value2 and value2 < value3 then flag=average1;

end;

run;

Xia Keshan

Red_Squirrel
Calcite | Level 5

Apologies, Xia, quick typing!

I obviously meant value1 < value2 < value3 Smiley Happy

is your solution still going to be ok for this logic?

I will try it out and let you know.

thank you!

Astounding
PROC Star

I think you'll need something a little more flexible.  Here's a mildly similar approach:

data want;

   set have;

  if (value1 < value2 < value3) and (saved_average=.) then do;

     max_recordnum = _n_ + 3;

     saved_average = average1;

  end;

  retain max_recordnum saved_average;

  if _n_ <= max_recordnum then required_result = saved_average;

  if _n_ = max_recordnum then saved_average = .;

  drop saved_average max_recordnum;

run;

Red_Squirrel
Calcite | Level 5

Hello Xia and all,

i think I did not explain my problem quite well before, sorry!

I have attached now the picture of the excel file with the sample data (I have only my iPad with me therefore not able to do anything better).

ideally what I need is the following:

- I need to work out the required_result column using the following logic:

    - for row 2 and 3 select average1 value (without using any other logic). Therefore required_result = 2 for row 2 and required_result = 3 for row 3;

    - for all other rows use the following logic: if value1 < value2 < value3 then use the first encountered value of the average1 for at least 4 rows (7 in this case)

    - continue using 7 after 4 rows if the following is FALSE - value1 < value2 < value3;

    - if after 4 rows above the following is TRUE - value1 < value2 < value3, then use next first encountered average1 value (24 in this case) for at least 4 rows and then repeat the logic.

thank you!


image.jpg
Astounding
PROC Star

That's actually a simpler program.  One way:

data want;

   set have;

   retain saved_average;

   if (value1 < value2 < value3) then saved_average = average1;

   is saved_average > . then required_result = saved_average;

   else required_result = average1;

   drop saved_average;

run;

Good luck.

Red_Squirrel
Calcite | Level 5

Thanks a lot for trying, Astounding!

however, in my example I have numbers instead of "." - I just did not want to write it all up, now I have and attachend a new file to explain.

- Green area - first two rows should always have required_result = average1 (without any other logic involved);

- Orange area, that's where we start comparing value1< value2 < value3 and if it is true, we need to use the first encountered average1 and use it for the next 4 rows regardless whether condition (value1< value2 < value3) is true or not (That's why it is 7 for all 4 rows).

- White area - we continue to use previous average1 value if condition (value1< value2 < value3) is FALSE (that's why it is 7 for rows 8, 9, 10).

- Blue area - if condition (value1< value2 < value3) is true then choose the first encountered average1 and use it for the next 4 rows regardless whether condition (value1< value2 < value3) is true or not (that's why it is 24 for the next 4 rows.)

- White area - we continue to use previous average1 value if condition (value1 < value2 < value3) is false (that's why it's 24 for another row. i have ended file here however I have a lot more entries down as well).

hope it makes sense.

thank you!


image.jpg
Astounding
PROC Star

Seems to make sense ... I'll go back to a variation of the first program:

data want;

   set have;

   if (value1 < value2 < value3) and (_n_ > 2) and (_n_ > max_recordnum) then do;

     max_recordnum = _n_ + 3;

     saved_average = average1;

  end;

  retain max_recordnum saved_average;

  if saved_average = . then required_result = average1;

  else required_result = saved_average;

  drop saved_average max_recordnum;

run;

Red_Squirrel
Calcite | Level 5

thank you so much, you are a genius, Astounding! Smiley Happy

That's great for what I explained so far, however what if we had additional condition:

- In addition to what we had before, we also have another condition to check  (value1 < required_result and value2 < required_result and value3 < required result). If that's true or previous condition  (value1 < value2 < value3) is true then required result takes the value of the first encountered average1.

As a result, you can see that the RED block has got now the required_result = 5 repeated for 4 rows and then White block immediately after is still using Required_result = 5 as both of our conditions do not hold true here (value1 < required_result and value2 < required_result and value3 < required result) is FALSE and (value1 < value 2 < value3) is also FALSE.

The challenge here is that 2nd condition is using the newly created variable required_result.

It would be great if you can help!

Thank you!


image.jpg
Astounding
PROC Star

Red,

I'm not sure how much more time I can devote, but let's start at least.

Why does line 15 remain as is, instead of beginning a new set of 4 with required_result=6?  It appears that the conditions are in place:  value1 < 24 and value2 < 24 and value3 < 24.

I don't see it.  Why does line 16 should trigger a new block of 4, but line 15 does not?

Red_Squirrel
Calcite | Level 5

Thank you for your time, Astounding, I really appreciate it!

I am sorry, I made a silly error typing numbers in on my iPad - in line 15 it should be 27 instead of 7 (please see updated numbers attached).

therefore both conditions do not hold for line 15.

I would be so grateful if you can help here!


image.jpg
Red_Squirrel
Calcite | Level 5

Hi all,

I tried to develop on Astounding's code and make sure that I will include the comparison with the newly extended average, however it does not work the way it should be.

can you please advise what needs to be done so that condition (value1 < saved_average and value2 < saved_average and value3 < saved_average) always tests against not saved_average but the newly created required_result variable?

Data want;

Set have;

If (value1 < value2 < value3) and (_n_ > 2) and (_n_ > max_recordnum) then do;

max_recordnum = _n_ + 3;

Saved_average = average1;

End;

Retain max_recordnum saved_average;

If saved_average = . Then required_result = average1;

Else required_result = saved_average;

If (value1 < saved_average and value2 < saved_average and value3 < saved_average) and (_n_ > max_recordnums) and (_n_ > max_recordnum) then do;

max_recordnums = _n_ + 3;

Saved_average2 = average1;

End;

Retain max_recordnums saved_average2;

If saved_average2 = .  And  saved_average = . Then required_result = average1;

Else required_result = saved_average;

If saved_average2 <> .  Then required_result = saved_average2;

Run;

Astounding
PROC Star

Red,

Here's the statement from the original that determines whether to begin another block of 4 records:

if (value1 < value2 < value3) and (_n_ > 2) and (_n_ > max_recordnum) then do;

It seems you would just need to add another condition and alter just this one statement:

if ( (value1 < value2 < value3) or (value1 < required_result and value2 < required_result and value3 < required_result) )

and (_n_ > 2) and (_n_ > max_recordnum) then do;

Does that seem about right to you?

Red_Squirrel
Calcite | Level 5

Thanks, Astounding.

yes, I have already tried and the additional condition does not work.

The code results stay unchanged as if there was not any additional condition.

Astounding
PROC Star

Are you sure you used OR in the right spot:

if ( (value1 < value2 < value3) or (value1 < required_result and value2 < required_result and value3 < required_result) )

and (_n_ > 2) and (_n_ > max_recordnum) then do;

It might help if you showed the exact code you tested.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1604 views
  • 7 likes
  • 3 in conversation