Hello,
I would like to assign seq number for each id's each "stop" activity group . if two stops are back to back, it will be counted as 1. how can I create "sseq" using example dataset ds1?
data ds1 ;
input id seq stop;
cards ;
1 1 0
1 2 1
1 3 0
1 4 0
1 5 1
1 6 1
2 1 0
2 2 0
2 3 1
2 4 1
2 5 1
2 6 0
2 7 1
2 8 0
2 9 1
;
run;
The expected output:
Thanks.
data step.
data want;
set ds1;
by id;
temp1=lag(stop);
if 1=stop=temp1 then temp2=0;
else temp2=stop;
retain temp3;
if first.id then temp3 =temp2;
else temp3= temp3+temp2;
if stop=0 then sstop=.;
else sstop=temp3;
drop temp:;
proc print; run;
data step.
data want;
set ds1;
by id;
temp1=lag(stop);
if 1=stop=temp1 then temp2=0;
else temp2=stop;
retain temp3;
if first.id then temp3 =temp2;
else temp3= temp3+temp2;
if stop=0 then sstop=.;
else sstop=temp3;
drop temp:;
proc print; run;
Hello @stataq,
Try this:
data want(drop=_c);
set ds1;
by id seq;
if first.id then _c=0;
_c+(stop & (~lag(stop) | first.id));
if stop then sseq=_c;
run;
and add more test cases, e.g., id=3 beginning with stop=1 (if this is possible in your real data).
@FreelanceReinh Could you further explain
_c+(stop & (~lag(stop) | first.id));
I have some difficulty to understand this logic.
Thanks.
@stataq wrote:
@FreelanceReinh Could you further explain
_c+(stop & (~lag(stop) | first.id));
First of all, this is a sum statement, i.e., variable _c (the "counter") is incremented by the value in the outer parentheses. That increment is a Boolean value: either 1 or 0, depending on whether the logical expression involving the AND (&), OR (|) and NOT (~) operators is TRUE (1) or FALSE (0). Non-zero, non-missing values of variable stop (in particular the value 1) are evaluated to TRUE. Zero and missing values are evaluated to FALSE.
The LAG function in this DATA step is called once for each observation of dataset DS1, which means that it returns the value of stop from the previous observation (and a numeric missing value in the very first observation). The value of automatic variable first.id is 1 for the first observation of each id BY-group and 0 otherwise.
So, considering that stop has only values 1 or 0, the increment equals
This is exactly what we need: A new "block" of consecutive observations with stop=1 of an id must (obviously) start with an observation with stop=1 and the only exception to the requirement "the previous observation had stop=0" (avoiding an incrementation within a block) is that we are at the first observation of the id. In the latter case the previous observation may be the last of a "stop=1 block" of the previous id. Also, for the very first observation of dataset DS1 there is no previous observation, lag(stop)=. (missing, i.e. FALSE), hence ~lag(stop)=1 (TRUE), but this is actually irrelevant because first.id=1 makes the subexpression (~lag(stop) | first.id) TRUE anyway.
Your case is a good example for a BY with the NOTSORTED option.
data ds1 ;
input id seq stop;
cards ;
1 1 0
1 2 1
1 3 0
1 4 0
1 5 1
1 6 1
2 1 0
2 2 0
2 3 1
2 4 1
2 5 1
2 6 0
2 7 1
2 8 0
2 9 1
;
run;
data ds1;
set ds1;
by id stop notsorted;
if first.id then _sseq = 0;
if first.stop and stop ne 0 then _sseq + 1;
if stop ne 0 then sseq=_sseq;
drop _sseq;
run;
proc print;
run;
To get a variable with those missing values you will need an separate variable to keep the count. So use the BY statement to find the groups. The SUM statement to increment the counter and an IF statement to assign the counter to create your wanted variable with the missing values otherwise.
data want;
set ds1;
by id stop notsorted;
if first.id then counter=0;
counter + (first.stop and stop);
if stop then sseq=counter;
run;
Result
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.