BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AT121
Fluorite | Level 6

I would like to generate a variable with all the local high and low temperatures for a multi-year period. I define local highs/lows by comparing 3 consecutive day temperatures.

 

If temp(t-1) > temp(t=0) and temp(t-1) > temp(t-2), generate high_i variable (where i=1,2,...,n)

If temp(t-1) < temp(t=0) and temp(t-1) < temp(t-2), generate low_i variable (where i=1,2,...,n)

 

Is there a more efficient code/method than creating a loop for SAS to continuously compare 3 consecutive day temperatures? Will the generated high_i and low_i variables contain the date and temperature values?

 

Thanks for all suggestions and input

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

So you just compare the preceding and succeeding values. This is done by combining LAG with a "look-ahead":

data have;
input day temp;
datalines;
1 74
2 76
3 75
4 73
5 72
6 74
7 77
8 78
9 80
10 79
11 78
12 76
13 75
14 76
15 77
;

data want;
merge
  have
  have (
    firstobs=2
    keep=temp
    rename=(temp=_temp)
  )
  end=done
;
length flag $4;
if
  lag(temp) > temp and _temp > temp
then flag = "low";
if
  lag(temp) < temp and _temp < temp
  and lag(temp) ne . and _temp ne . /* guards against missing values at beginning and end */
then flag = "high";
drop _temp;
run;

which flags the four values 2, 5, 9 and 13.

View solution in original post

13 REPLIES 13
andreas_lds
Jade | Level 19

Difficult to understand without seeing the data you have and what you expect as result actually.

So please post data in usable form: as data step using datalines, have a look at How to convert datasets to data steps for details.

ballardw
Super User

Here is my guess of what I think you are doing. I interpret this as the local high is the largest temp for the current, previous day and the day before that and local low as the smallest of those 3.

If that is the case then maybe:

 

data example;
   input day temp;
datalines;
1 45
2 37
3 40
4 28
5 18
6 30
;

data guess;
   set example;
   high = max(temp,lag1(temp),lag2(temp));
   low  = min(temp,lag1(temp),lag2(temp));
run;

The Lag1(variablename) function returns the value of the variable from the previous iteration of the data step, lag2() from the iteration before that. Note that in this case that is equivalent to the previous record and one before. That is not always the case when conditional code is used where Lag, and its companion function Dif, can return values from the last time the condition was true.

Lag will return missing values for values that would be before the available records. On the first record of the set both Lag1 and Lag2 are missing; on the second record lag1 has a value but lag2 does not.

 

You can have a lot of LagN values though many times people using N>25 or so might be better off with another procedure or approach.

 

If this is not what you want you now have a small example data set that you can use as a template for showing what the actual values for High and Low should be.

AT121
Fluorite | Level 6

Here is an example of data:

Temperature is the average of the high and low for the day

Day        Temperature(*F)

1                     74

2                     76

3                     75

4                     73

5                     72

6                     74

7                     77

8                     78

9                     80

10                   79

11                   78

12                   76

13                   75

14                   76

15                   77

 

The output I want is:

high_1: on day 2 @76*

low_1: on day 5 @72*

high_2: on day 9  @80*

low_2: on day 13 @75* 

 

If my dataset was over 1000 days of data, I want to calculate all the local high_i and low_i using a 3 consecutive day comparison of temperatures

andreas_lds
Jade | Level 19

Sorry, but i still don't understand the logic you want to apply. You said:

I want to calculate all the local high_i and low_i using a 3 consecutive day comparison of temperatures

So why is low_1 on day 5 and not on day 1?

ballardw
Super User

@AT121 wrote:

 

The output I want is:

high_1: on day 2 @76*

low_1: on day 5 @72*

high_2: on day 9  @80*

low_2: on day 13 @75* 

 

If my dataset was over 1000 days of data, I want to calculate all the local high_i and low_i using a 3 consecutive day comparison of temperatures


Now provide, by listing the days used exactly how you get 72 on Day. It looks like you are comparing values to AFTER the given day. Which I don't think matches your original description as everything there was framed in terms of temp(t-1) which would seem to me to be "time minus 1".

AT121
Fluorite | Level 6

Sorry I will try to be more clear. I would like to compare temperature values after each day. 

 

high_1 is found by comparing 3 consecutive days (day 3,2,1). Because temp(day 2) > temp(day1) and temp(day2) > temp(day 3) this would be the first local high. A local high cannot be found at day 1 because there is not 3 days of temperatures to compare yet (I can compare temperatures after day 3). I want the high_1 data/value to have day and temperature information.

 

low_1 is found by comparing 3 consecutive days (day 6,5,4). Because temp(day5) < temp(day4) and temp(day5) < temp(day6), day 5 would be considered a local low.

 

I thought about having SAS graph the temperatures over time, then compare the change in temperature between 2 days (the slope of the change). When the slope transitions from positive to negative, that would be a local high (and local low when the slope transitions from negative to positive). Could SAS label each high (low) value (using the change in slope coefficient)?

 

Another solution I was thinking about was having SAS drop values when the following is not true:

If temp(yesterday) > temp(today) and temp(yesterday) > temp(2 days ago) or

If temp(yesterday) < temp(today) and temp(yesterday) < temp(2 days ago)

This would create a dataset with only local high and low values [along with information about which day was a high(low)]

 

However, I was hoping there was a code to generate high_i (low_i) if my dataset had a decade worth of temperature data, rather than deleting values from the original data set to only include local high(low) values

 

Thanks

ballardw
Super User

Don't understand this high_1 high_2 names and would say they are a bad idea going forward.

This creates the output. Two passes through the data, same technique just processing in different order.

 

data example;
   input day temp;
datalines;
1    74
2    76
3    75
4    73
5    72
6    74
7    77
8    78
9    80
10   79
11   78
12   76
13   75
14   76
15   77
;
data temp;
   set example;
   high = max(temp,lag1(temp),lag2(temp));
run;
proc sort data=temp;
  by descending day;
run;
data want;
   set temp;
   low  = min(temp,lag1(temp),lag2(temp));
run;
proc sort data=want;
   by day;
run;

Really really consider a high and low reflecting different periods reported on the same day as likely to cause much confusion with anyone else looking at data like this.

 

I've worked with weather data (example: https://www.ars.usda.gov/research/publications/publication/?seqNo115=45521 ) and the approach you have for high/low comparisons appears bit odd.

 

 

 

AT121
Fluorite | Level 6

Thank you for your input. I agree generating a new variable of high_i (low_i) as SAS compares the temperatures of today, yesterday, and 2 days ago is confusing.

 

I think the easiest way to achieve the local high (low) results, is to use a loop comparing temperatures of today, yesterday, and 2 days. 

If temp(yesterday) > temp(today) and temp(yesterday) > temp(2 days ago) or

If temp(yesterday) < temp(today) and temp(yesterday) < temp(2 days ago)

then SAS should keep the data, otherwise SAS should delete the data.

 

Would I need to use a loop in SAS to use this method?

 

I won't have the labels of high_i (low_i), but I will be able to compare adjacent local high and low values

Tom
Super User Tom
Super User

Let's try to translate that into English first.

 

So you consider a date as have a "local high" it if has the strictty increasing temperatures.

And a "local low" if it has three strictly decreasing temperatures.

 

Sounds like you want to track how long a run of increasing (decreasing) temps you have.

Perhaps something like:

data have;
  day +1;
  input temp @@;
cards;
74 76 75 73 72 74 77 78 80 79 78 76 75 76 77
;
data step1;
  set have;
  if temp>lag(temp)>. then high+1;
  else high=0;
  if .<temp<lag(temp) then low+1;
  else low=0;
run;

Results

Obs    day    temp    high    low

  1      1     74       0      0
  2      2     76       1      0
  3      3     75       0      1
  4      4     73       0      2
  5      5     72       0      3
  6      6     74       1      0
  7      7     77       2      0
  8      8     78       3      0
  9      9     80       4      0
 10     10     79       0      1
 11     11     78       0      2
 12     12     76       0      3
 13     13     75       0      4
 14     14     76       1      0
 15     15     77       2      0

So then any observation with HIGH > 2 would be a "local high".

AT121
Fluorite | Level 6

Thank you for your input. However, I would like to use the following definitions to distinguish a local high (low):

Local high: if temp(yesterday) > temp(today) and temp(yesterday) > temp(2 days ago) 

Local low: if temp(yesterday) < temp(today) and temp(yesterday) < temp(2 days ago)

 

your method has any observation of high > 2 as a local high. This would mean there was a local high on days 8 and 9, however, day 8 cannot be a local high because temp(day8) > temp(day7) but temp(day8) is not great than temp(day9).

 

Would it be easiest to create a loop in SAS that evaluated 3 temperatures of consecutive days?

if temp(yesterday) > temp(today) and temp(yesterday) > temp(2 days ago) keep, otherwise delete; or 

if temp(yesterday) < temp(today) and temp(yesterday) < temp(2 days ago) keep, otherwise delete;

I won't have the high_i (low_i) labels, but I will have the dates and temperature values of all the local highs (lows) according to definition I created

Tom
Super User Tom
Super User

Your points seems be contradicting each other.  You say you want to test three days. Then reject one answer by checking a forth day. 

Is it that you want to flag the END of the run of increasing days?

data have;
  day +1;
  input temp @@;
cards;
74 76 75 73 72 74 77 78 80 79 78 76 75 76 77
;
data step1;
  set have;
  sign=sign(coalesce(dif(temp),0));
run;

data step2;
  set step1;
  by sign notsorted;
  if first.sign then run=1;
  else run+1;
  if last.sign and run>2 then do;
    if sign=1 then flag='High';
    else if sign=-1 then flag='Low';
  end;
run;

Result

Obs    day    temp    sign    run    flag

  1      1     74       0      1
  2      2     76       1      1
  3      3     75      -1      1
  4      4     73      -1      2
  5      5     72      -1      3     Low
  6      6     74       1      1
  7      7     77       1      2
  8      8     78       1      3
  9      9     80       1      4     High
 10     10     79      -1      1
 11     11     78      -1      2
 12     12     76      -1      3
 13     13     75      -1      4     Low
 14     14     76       1      1
 15     15     77       1      2

Note if the dataset is large STEP1 can be a VIEW instead of an actual physical dataset.

AT121
Fluorite | Level 6

I appreciate your feedback. Creating the flag is an interesting solution. How would I adjust the code so it will flag day 2 as a high?

 

would these minor changes in step1 and step 2 work?

sign=sign(coalesce(dif(temp),1);

 

if last.sign and run>1 then do;

 

I don't think those changes will work if the next sequence of the data is below

 

I am not trying to be contradictory. I do want to flag the end of a run of increasing (decreasing) days, but I am allowing for a whipsaw of temperatures. For example, if the next sequence of temperatures was:

Day       Temp

16             78

17             80

18             79

19             81

20             80

21             82

22             81

 

the output would flag: day17- High, day18- Low, day19- High, day20- Low, day21- High

Kurt_Bremser
Super User

So you just compare the preceding and succeeding values. This is done by combining LAG with a "look-ahead":

data have;
input day temp;
datalines;
1 74
2 76
3 75
4 73
5 72
6 74
7 77
8 78
9 80
10 79
11 78
12 76
13 75
14 76
15 77
;

data want;
merge
  have
  have (
    firstobs=2
    keep=temp
    rename=(temp=_temp)
  )
  end=done
;
length flag $4;
if
  lag(temp) > temp and _temp > temp
then flag = "low";
if
  lag(temp) < temp and _temp < temp
  and lag(temp) ne . and _temp ne . /* guards against missing values at beginning and end */
then flag = "high";
drop _temp;
run;

which flags the four values 2, 5, 9 and 13.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1569 views
  • 1 like
  • 5 in conversation