@linlin87 wrote:
Dear SAS New User,
Please can help with this problem, it trying to detect and enumerate drops in relative humidity (RH) by geographical location. The drop needs enumerating for later analysis.
Defining drop in RH follows rules:
1) to be a drop in RH it must have 3 consecutive datapoint in humidity below a threshold (less than or equal 30%).
2) The drop in humidity finishes stop if there are 3 consecutive points above threshold (31% and above).
3) Then on the next started drop (within that location), it labels drop as +1.
I would say that you need to reconsider your rules a bit. I suspect this has more that a bit of "boundary value problem" behavior, meaning that the boundaries of the "drop" and end intervals have not been clearly enough defined. For example you are saying "threshold (31% and above)" which leaves a gap between the less than or equal to 30. You shown data may not indicate any such but Relative Humidity is a continuous measurement and values like 30.5 need to be accounted for in the programming. OR you need to explicitly state that you have only integer values.
Also you have no rules addressing missing values. What would you do if you have an hour with no value?
Or what if sequential Datetime values are excessive, such as 2 hours between measures for the same location?
I'm not sure that your rules clearly define what the result should look like with values like
31
32
31
28
27
28
31
29
33
27
37
21
36
32
21
22
BTW I would not call this a "drop", but a "below threshold". Short hand names can create confusing statements in a narrative when it is hard to tell whether a "variable" or "data value" or external physical process might be meant. A value reduction from 32 to 31 would be considered a "drop" by most people if discussing numeric values where as value change from 28 to 29 would be considered an increase and not a "drop" in any form.
What about RH less than 0 or greater than 100? Theoretically shouldn't occur but I have seen instrument reading outside that range. I've also had instruments that missed reporting intervals because of failure or maintenance. So I am used to considering the time between measurements as important for any such change of values.