@stataq wrote:
@FreelanceReinh Could you further explain
_c+(stop & (~lag(stop) | first.id));
First of all, this is a sum statement, i.e., variable _c (the "counter") is incremented by the value in the outer parentheses. That increment is a Boolean value: either 1 or 0, depending on whether the logical expression involving the AND (&), OR (|) and NOT (~) operators is TRUE (1) or FALSE (0). Non-zero, non-missing values of variable stop (in particular the value 1) are evaluated to TRUE. Zero and missing values are evaluated to FALSE.
The LAG function in this DATA step is called once for each observation of dataset DS1, which means that it returns the value of stop from the previous observation (and a numeric missing value in the very first observation). The value of automatic variable first.id is 1 for the first observation of each id BY-group and 0 otherwise.
So, considering that stop has only values 1 or 0, the increment equals
1 if the current observation has stop=1 AND (the previous observation has stop=0 OR the current observation is the first of the current id)
0 otherwise.
This is exactly what we need: A new "block" of consecutive observations with stop=1 of an id must (obviously) start with an observation with stop=1 and the only exception to the requirement "the previous observation had stop=0" (avoiding an incrementation within a block) is that we are at the first observation of the id. In the latter case the previous observation may be the last of a "stop=1 block" of the previous id. Also, for the very first observation of dataset DS1 there is no previous observation, lag(stop)=. (missing, i.e. FALSE), hence ~lag(stop)=1 (TRUE), but this is actually irrelevant because first.id=1 makes the subexpression (~lag(stop) | first.id) TRUE anyway.
... View more