BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
stataq
Quartz | Level 8

Hello,

 

I would like to assign seq number for each id's each "stop" activity group .  if two stops are back to back, it will be counted as 1. how can I create "sseq" using example dataset ds1?

 

data ds1 ;
input id seq stop;
cards ;
1 1 0
1 2 1
1 3 0
1 4 0
1 5 1
1 6 1
2 1 0
2 2 0
2 3 1
2 4 1
2 5 1
2 6 0
2 7 1
2 8 0
2 9 1
;
run;

 

The expected output:

stataq_0-1701710045105.png

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
A_Kh
Lapis Lazuli | Level 10

data step.

data want;
	set ds1;
	by id;
	temp1=lag(stop);
	if 1=stop=temp1 then temp2=0; 
	else temp2=stop; 
	retain temp3; 
	if first.id then temp3 =temp2;
	else temp3= temp3+temp2;
	if stop=0 then sstop=.;
	else sstop=temp3;  
	drop temp:;
proc print; run; 

View solution in original post

6 REPLIES 6
A_Kh
Lapis Lazuli | Level 10

data step.

data want;
	set ds1;
	by id;
	temp1=lag(stop);
	if 1=stop=temp1 then temp2=0; 
	else temp2=stop; 
	retain temp3; 
	if first.id then temp3 =temp2;
	else temp3= temp3+temp2;
	if stop=0 then sstop=.;
	else sstop=temp3;  
	drop temp:;
proc print; run; 
FreelanceReinh
Jade | Level 19

Hello @stataq,

 

Try this:

data want(drop=_c);
set ds1;
by id seq;
if first.id then _c=0;
_c+(stop & (~lag(stop) | first.id));
if stop then sseq=_c;
run;

and add more test cases, e.g., id=3 beginning with stop=1 (if this is possible in your real data).

stataq
Quartz | Level 8

@FreelanceReinh Could you further explain 

_c+(stop & (~lag(stop) | first.id));

I have some difficulty to understand this logic. 

 

Thanks.

FreelanceReinh
Jade | Level 19

@stataq wrote:

@FreelanceReinh Could you further explain 

_c+(stop & (~lag(stop) | first.id));

First of all, this is a sum statement, i.e., variable _c (the "counter") is incremented by the value in the outer parentheses. That increment is a Boolean value: either 1 or 0, depending on whether the logical expression involving the AND (&), OR (|) and NOT (~) operators is TRUE (1) or FALSE (0). Non-zero, non-missing values of variable stop (in particular the value 1) are evaluated to TRUE. Zero and missing values are evaluated to FALSE.

 

The LAG function in this DATA step is called once for each observation of dataset DS1, which means that it returns the value of stop from the previous observation (and a numeric missing value in the very first observation). The value of automatic variable first.id is 1 for the first observation of each id BY-group and 0 otherwise.

 

So, considering that stop has only values 1 or 0, the increment equals

  • 1 if the current observation has stop=1 AND (the previous observation has stop=0 OR the current observation is the first of the current id)
  • 0 otherwise.

This is exactly what we need: A new "block" of consecutive observations with stop=1 of an id must (obviously) start with an observation with stop=1 and the only exception to the requirement "the previous observation had stop=0" (avoiding an incrementation within a block) is that we are at the first observation of the id. In the latter case the previous observation may be the last of a "stop=1 block" of the previous id. Also, for the very first observation of dataset DS1 there is no previous observation, lag(stop)=. (missing, i.e. FALSE), hence ~lag(stop)=1 (TRUE), but this is actually irrelevant because first.id=1 makes the subexpression (~lag(stop) | first.id) TRUE anyway.

data_null__
Jade | Level 19

Your case is a good example for a BY with the NOTSORTED option.

 

data ds1 ;
input id seq stop;
cards ;
1 1 0
1 2 1
1 3 0
1 4 0
1 5 1
1 6 1
2 1 0
2 2 0
2 3 1
2 4 1
2 5 1
2 6 0
2 7 1
2 8 0
2 9 1
;
run;

data ds1;
   set ds1;
   by id stop notsorted;
   if first.id then _sseq = 0;
   if first.stop and stop ne 0 then _sseq + 1;
   if stop ne 0 then sseq=_sseq;
   drop _sseq;
   run;
proc print;
   run;
 

Capture.PNG

Tom
Super User Tom
Super User

To get a variable with those missing values you will need an separate variable to keep the count.  So use the BY statement to find the groups. The SUM statement to increment the counter and an IF statement to assign the counter to create your wanted variable with the missing values otherwise.

data want;
  set ds1;
  by id stop notsorted;
  if first.id then counter=0;
  counter + (first.stop and stop);
  if stop then sseq=counter;
run;

Result

Tom_0-1727218109356.png

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 904 views
  • 4 likes
  • 5 in conversation