BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Richard_Gordon
Fluorite | Level 6

Hi,

Here's the problem I am trying to solve:

 

Suppose you have 3 variables, Time, Upper, and Lower.  Time is the moment of a reading, Upper is whether that reading is above (1) or below (0) the upper limit, and Lower is whether that reading is above (1) or below (0) the lower limit.  Using those three variables, I am trying to compute a Target binary variable with the following logic:

 

     If lag(Target) = 0 then Target = Upper else Target = Lower

 

As an example, think about tank that is stores water and needs to discharge if the level gets too high.  So if the level gets above an upper limit, a pump is turned on until the level falls below the lower limit.  There is no way to predict how long the tank is above the lower limit and no way to determine how long it takes to discharge water below the lower limit.  To me, this is row by row processing using the previous state to determine limit (upper or lower) should be used to determine whether the pump is turned on or off.

 

I have attached an example data set to illustrate the variables and Target outcome I am looking to compute.

 

Thanking you in advance,

rg

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
There are too many scenarios you need consider about.



data have;
infile cards truncover expandtabs;
input Time	Upper	Lower	Target;
cards;
1	0	0	0
2	0	1	0
3	0	1	0
4	0	1	0
5	0	1	0
6	0	1	0
7	0	1	0
8	0	1	0
9	0	1	0
10	1	1	1
11	1	1	1
12	1	1	1
13	0	1	1
14	0	1	1
15	0	1	1
16	0	1	1
17	0	0	0
18	0	0	0
19	0	0	0
20	0	0	0
21	0	1	0
22	0	1	0
23	0	1	0
24	1	1	1
25	1	1	1
26	1	1	1
27	1	1	1
28	1	1	1
29	0	1	1
30	0	1	1
31	0	0	0
32	0	0	0
33	0	0	0
34	0	0	0
35	0	0	0
;
run;
data want;
 set have;
 retain _target 0;
 if upper=1 then _target=1;
 if lower=0 then _target=0;
run;


View solution in original post

7 REPLIES 7
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

This link: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... has instructions on how to create program text for a datastep to create the data that can be posted here. Then interested readers can test behavior with your data.

 

I may think somewhat differently but if your Upper value indicates out of expected range (above the upper limit) why doesn't the Lower value have 1 when it is out of range (below the lower limit)?

 

Richard_Gordon
Fluorite | Level 6

Thanks for the link on how to format questions for the Community environment.  I will review and apply to future posts.

 

That said,  the lower limit is 0 when it is out of range and the values could be reversed-coded if you prefer.  The problem is that while the upper limit is an explicit start of the event (turning on the pump), there is that curious state where the level is above the lower and below the upper limit.  In the curious state, if the pump is NOT on, then leave it OFF (not a critical level) and if the pump is ON, the leave it ON (to expected level of the lower limit).  The two-step limit approach may help minimize costs or risks associated with turning on the pump and associated systems based upon a single limit.

 

The problem as described is a general class of event detections and the example is simply an applied story to the data.  Nonetheless, the Target variable value is determined by the conditional evaluation of the previous state (lag(Target)) in order to use the upper or lower limit values to update the current state.

 

I am new to SAS and looking for help on how to program this Target variable; either in the datastep using retain statements or in ds2 using threading (or other methods unknown at this time).

 

Does this make sense?

ballardw
Super User

Please look closely at the documentation for LAG. There can be quite a bit of complication when lag gets involved with IF.

In many cases something like:

 

LagTarget=lag(target);

If LagTarget = 0 then Target = Upper;

else Target = Lower;

 

often addresses most issues.

 

What is your rule for the very first record? The lagged value will be missing.

Ksharp
Super User
There are too many scenarios you need consider about.



data have;
infile cards truncover expandtabs;
input Time	Upper	Lower	Target;
cards;
1	0	0	0
2	0	1	0
3	0	1	0
4	0	1	0
5	0	1	0
6	0	1	0
7	0	1	0
8	0	1	0
9	0	1	0
10	1	1	1
11	1	1	1
12	1	1	1
13	0	1	1
14	0	1	1
15	0	1	1
16	0	1	1
17	0	0	0
18	0	0	0
19	0	0	0
20	0	0	0
21	0	1	0
22	0	1	0
23	0	1	0
24	1	1	1
25	1	1	1
26	1	1	1
27	1	1	1
28	1	1	1
29	0	1	1
30	0	1	1
31	0	0	0
32	0	0	0
33	0	0	0
34	0	0	0
35	0	0	0
;
run;
data want;
 set have;
 retain _target 0;
 if upper=1 then _target=1;
 if lower=0 then _target=0;
run;


Richard_Gordon
Fluorite | Level 6

Thank you!

 

The code worked for this simulated data as well as the more complex events I am modeling.

 

That said, I am not one to ignore warnings :), what are your thinking about regarding the "many scenarios you need to consider about"?

 

 

 

Ksharp
Super User
That depends on what you want. 
E.X. if Upper=1 Lower=0, what you gonna do .


Richard_Gordon
Fluorite | Level 6
Oh ... the data by design cannot have the upper = 1 without lower = 1. To your point, if there are two instruments making readings, an instrument failure could create that condition and the pump (in this example) would repeatedly turn on and off.

The data is checked for completeness prior to event detection and I would expect to develop an imputation process for missing data, if present.

Nonetheless, thanks for thinking ahead and providing such a succinct solution. I will be using it to develop a class of discretized events for my POC.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2558 views
  • 0 likes
  • 3 in conversation