BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Rodcjones
Obsidian | Level 7

I would like to generate sums of records based on a rule that says to count consecutively-dated observations of a Yes value as 1 until 5 in a row or a No value, then start over. Not sure if I explained that well, so I am providing a have data set with raw data, an intermediate data set that shows how I'd like to count*, and the want data set that sums the counts by ID. 

Here are the data sets:

data have;
input date :mmddyy8. id $ event $ ;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes
09/02/20 1 No
09/03/20 1 Yes
09/04/20 1 No
09/05/20 1 Yes
09/06/20 1 No
09/07/20 1 Yes
09/08/20 1 No
09/09/20 1 Yes
09/10/20 1 Yes
09/11/20 1 Yes
09/12/20 1 No
09/13/20 1 Yes
09/14/20 1 No
09/15/20 1 No
09/16/20 1 Yes
09/17/20 1 No
09/18/20 1 No
09/19/20 1 No
09/20/20 1 Yes
09/21/20 1 Yes
09/22/20 1 No
09/23/20 1 No
09/24/20 1 No
09/25/20 1 No
09/26/20 1 No
09/27/20 1 No
09/28/20 1 Yes
09/29/20 1 No
09/30/20 1 No
10/01/20 1 No
10/02/20 1 No
10/03/20 1 No
10/04/20 1 No
10/05/20 1 Yes
10/06/20 1 No
10/07/20 1 No
10/08/20 1 No
10/09/20 1 Yes
10/10/20 1 No
10/11/20 1 No
10/12/20 1 No
10/13/20 1 No
10/14/20 1 No
09/01/20 2 No
09/02/20 2 No
09/03/20 2 No
09/04/20 2 Yes
09/05/20 2 Yes
09/06/20 2 Yes
09/07/20 2 Yes
09/08/20 2 Yes
09/09/20 2 Yes
09/10/20 2 Yes
09/11/20 2 Yes
09/12/20 2 Yes
09/13/20 2 Yes
09/14/20 2 Yes
09/15/20 2 Yes
09/16/20 2 Yes
09/17/20 2 No
;


data intermediate;
input date :mmddyy8. id $ event $ count ;
format date MMDDYY8. ;
datalines;
9/1/2020 1 Yes 1
9/2/2020 1 No 0
9/3/2020 1 Yes 1
9/4/2020 1 No 0
9/5/2020 1 Yes 1
9/6/2020 1 No 0
9/7/2020 1 Yes 1
9/8/2020 1 No 0
9/9/2020 1 Yes 1
9/10/2020 1 Yes 0
9/11/2020 1 Yes 0
9/12/2020 1 No 0
9/13/2020 1 Yes 1
9/14/2020 1 No 0
9/15/2020 1 No 0
9/16/2020 1 Yes 1
9/17/2020 1 No 0
9/18/2020 1 No 0
9/19/2020 1 No 0
9/20/2020 1 Yes 1
9/21/2020 1 Yes 1
9/22/2020 1 No 0
9/23/2020 1 No 0
9/24/2020 1 No 0
9/25/2020 1 No 0
9/26/2020 1 No 0
9/27/2020 1 No 0
9/28/2020 1 Yes 1
9/29/2020 1 No 0
9/30/2020 1 No 0
10/1/2020 1 No 0
10/2/2020 1 No 0
10/3/2020 1 No 0
10/4/2020 1 No 0
10/5/2020 1 Yes 1
10/6/2020 1 No 0
10/7/2020 1 No 0
10/8/2020 1 No 0
10/9/2020 1 Yes 1
10/10/2020 1 No 0
10/11/2020 1 No 0
10/12/2020 1 No 0
10/13/2020 1 No 0
10/14/2020 1 No 0
9/1/2020 2 No 0
9/2/2020 2 No 0
9/3/2020 2 No 0
9/4/2020 2 Yes 1
9/5/2020 2 Yes 0
9/6/2020 2 Yes 0
9/7/2020 2 Yes 0
9/8/2020 2 Yes 0
9/9/2020 2 Yes 1
9/10/2020 2 Yes 0
9/11/2020 2 Yes 0
9/12/2020 2 Yes 0
9/13/2020 2 Yes 0
9/14/2020 2 Yes 1
9/15/2020 2 Yes 0
9/16/2020 2 Yes 0
9/17/2020 2 No 0
;


data want;
input id $ sum ;
datalines;
1 12
2 3
;

Any ideas would be greatly appreciated!

 

*in the intermediate data set, note especially how these rows are counted. The intermediate data set is intended to illustrate, and is not necessary to generate in the solution if it's not needed.

9/9/2020 1 Yes 1
9/10/2020 1 Yes 0
9/11/2020 1 Yes 0

and

9/4/2020 2 Yes 1
9/5/2020 2 Yes 0
9/6/2020 2 Yes 0
9/7/2020 2 Yes 0
9/8/2020 2 Yes 0
9/9/2020 2 Yes 1
9/10/2020 2 Yes 0
9/11/2020 2 Yes 0
9/12/2020 2 Yes 0
9/13/2020 2 Yes 0
9/14/2020 2 Yes 1
9/15/2020 2 Yes 0
9/16/2020 2 Yes 0
1 ACCEPTED SOLUTION

Accepted Solutions
AMSAS
SAS Super FREQ

@Rodcjones 

I think I follow your example, although it appears your have  dataset isn't correct, I think your count should be 11 for ID 1 not 12. Unless I misunderstood something.

Here's the code I used, first data step is your have data step code, the second data step counts the "Yes" events by Id, ignoring the 2nd-4th consecutive "Yes" events.

I've commented the code so hopefully, that will help, along with the documentation:
By Group Processing

Retain Statement


data have;
input date :mmddyy8. id $ event $ ;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes
09/02/20 1 No
09/03/20 1 Yes
09/04/20 1 No
09/05/20 1 Yes
09/06/20 1 No
09/07/20 1 Yes
09/08/20 1 No
09/09/20 1 Yes
09/10/20 1 Yes
09/11/20 1 Yes
09/12/20 1 No
09/13/20 1 Yes
09/14/20 1 No
09/15/20 1 No
09/16/20 1 Yes
09/17/20 1 No
09/18/20 1 No
09/19/20 1 No
09/20/20 1 Yes
09/21/20 1 Yes
09/22/20 1 No
09/23/20 1 No
09/24/20 1 No
09/25/20 1 No
09/26/20 1 No
09/27/20 1 No
09/28/20 1 Yes
09/29/20 1 No
09/30/20 1 No
10/01/20 1 No
10/02/20 1 No
10/03/20 1 No
10/04/20 1 No
10/05/20 1 Yes
10/06/20 1 No
10/07/20 1 No
10/08/20 1 No
10/09/20 1 Yes
10/10/20 1 No
10/11/20 1 No
10/12/20 1 No
10/13/20 1 No
10/14/20 1 No
09/01/20 2 No
09/02/20 2 No
09/03/20 2 No
09/04/20 2 Yes
09/05/20 2 Yes
09/06/20 2 Yes
09/07/20 2 Yes
09/08/20 2 Yes
09/09/20 2 Yes
09/10/20 2 Yes
09/11/20 2 Yes
09/12/20 2 Yes
09/13/20 2 Yes
09/14/20 2 Yes
09/15/20 2 Yes
09/16/20 2 Yes
09/17/20 2 No
;
run ;

data want ;
	retain 
		/* retain various variable values */
		/* YN_flag : The last changed value of event */
		YN_flag "   " 
		/* count   : The count of yes values */
		count   0 
		/* count5  : Count of consecutive values, used to ignore 2-5 consecutive Yes events */
		count5  0 ;
	/* read have dataset */
	set have ;
	/* Set up by group processing */
	by id ;
	/* When it is the first occurance of an Id reset the counts and flag */
	if first.id then do ;
		YN_flag="   " ;
		count=0 ;
		count5=0 ;
	end ;
	/* If the event and YN_flag are different do this */
	if event ne YN_flag then do ;
		/* retain the event value */
		YN_flag=event ;
		/* if the event is "Yes" count it, and set count5 to 1 */
		if event="Yes" then do ;
			count+1 ;
			count5=1 ;
		end ;
	end ;
	/* If event=YN_Flag (from prior observation) and event="Yes" */
	/* then we need to check if we should count or ignore this   */
	else if event="Yes" then do ;
		/* Add 1 to the count5 */
		count5+1 ;
		/* If count5 is 5 then we need to count this Yes event, as it is the sixth consecutive Yes */
		/* and reset count5 to start counting the next 5 consecutive Yes events */
		if count5=5 then do ;
			count5=0 ;
			count+1 ;
		end ;
	end ;
	/* If it is the last occurance of an ID then output the number of Yes events for that Id */
	if last.id then do ;
		output ;
	end ;
run ;

View solution in original post

4 REPLIES 4
AMSAS
SAS Super FREQ

@Rodcjones 

I think I follow your example, although it appears your have  dataset isn't correct, I think your count should be 11 for ID 1 not 12. Unless I misunderstood something.

Here's the code I used, first data step is your have data step code, the second data step counts the "Yes" events by Id, ignoring the 2nd-4th consecutive "Yes" events.

I've commented the code so hopefully, that will help, along with the documentation:
By Group Processing

Retain Statement


data have;
input date :mmddyy8. id $ event $ ;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes
09/02/20 1 No
09/03/20 1 Yes
09/04/20 1 No
09/05/20 1 Yes
09/06/20 1 No
09/07/20 1 Yes
09/08/20 1 No
09/09/20 1 Yes
09/10/20 1 Yes
09/11/20 1 Yes
09/12/20 1 No
09/13/20 1 Yes
09/14/20 1 No
09/15/20 1 No
09/16/20 1 Yes
09/17/20 1 No
09/18/20 1 No
09/19/20 1 No
09/20/20 1 Yes
09/21/20 1 Yes
09/22/20 1 No
09/23/20 1 No
09/24/20 1 No
09/25/20 1 No
09/26/20 1 No
09/27/20 1 No
09/28/20 1 Yes
09/29/20 1 No
09/30/20 1 No
10/01/20 1 No
10/02/20 1 No
10/03/20 1 No
10/04/20 1 No
10/05/20 1 Yes
10/06/20 1 No
10/07/20 1 No
10/08/20 1 No
10/09/20 1 Yes
10/10/20 1 No
10/11/20 1 No
10/12/20 1 No
10/13/20 1 No
10/14/20 1 No
09/01/20 2 No
09/02/20 2 No
09/03/20 2 No
09/04/20 2 Yes
09/05/20 2 Yes
09/06/20 2 Yes
09/07/20 2 Yes
09/08/20 2 Yes
09/09/20 2 Yes
09/10/20 2 Yes
09/11/20 2 Yes
09/12/20 2 Yes
09/13/20 2 Yes
09/14/20 2 Yes
09/15/20 2 Yes
09/16/20 2 Yes
09/17/20 2 No
;
run ;

data want ;
	retain 
		/* retain various variable values */
		/* YN_flag : The last changed value of event */
		YN_flag "   " 
		/* count   : The count of yes values */
		count   0 
		/* count5  : Count of consecutive values, used to ignore 2-5 consecutive Yes events */
		count5  0 ;
	/* read have dataset */
	set have ;
	/* Set up by group processing */
	by id ;
	/* When it is the first occurance of an Id reset the counts and flag */
	if first.id then do ;
		YN_flag="   " ;
		count=0 ;
		count5=0 ;
	end ;
	/* If the event and YN_flag are different do this */
	if event ne YN_flag then do ;
		/* retain the event value */
		YN_flag=event ;
		/* if the event is "Yes" count it, and set count5 to 1 */
		if event="Yes" then do ;
			count+1 ;
			count5=1 ;
		end ;
	end ;
	/* If event=YN_Flag (from prior observation) and event="Yes" */
	/* then we need to check if we should count or ignore this   */
	else if event="Yes" then do ;
		/* Add 1 to the count5 */
		count5+1 ;
		/* If count5 is 5 then we need to count this Yes event, as it is the sixth consecutive Yes */
		/* and reset count5 to start counting the next 5 consecutive Yes events */
		if count5=5 then do ;
			count5=0 ;
			count+1 ;
		end ;
	end ;
	/* If it is the last occurance of an ID then output the number of Yes events for that Id */
	if last.id then do ;
		output ;
	end ;
run ;
PeterClemmensen
Tourmaline | Level 20

With the correct note from @AMSAS that the sum for ID=1 should be 11..

 

data want(keep=id sum);
   do until (last.id);
      set have;
      by id;
      if event = 'Yes' & c = 0 then do;
         c + 1; 
         sum = sum(sum, 1);
      end;
      else if event = 'Yes' & c then c + 1;
      if c = 5 | event = 'No' then c = 0;
   end;
run;

Result

 

id sum 
1  11 
2  3 
Rodcjones
Obsidian | Level 7

Thank you @AMSAS and @PeterClemmensen  for these excellent and prompt solutions. I tinkered a bit with customizing these to a couple different scenarios/datasets and they work great. I appreciate the links to the extra info on RETAIN as well. I use that from time to time but have a hard time applying it to new problems. 

Rodcjones
Obsidian | Level 7

In applying these great solutions to a project, I was confronted with a new but related challenge. I need a way to prevent the “counter” from resetting until I have two consecutive days of “No”.

In other words, in the original post this subset of ID 1,

 

09/09/20 1 Yes

09/10/20 1 Yes

09/11/20 1 Yes

09/12/20 1 No

09/13/20 1 Yes

09/14/20 1 No

09/15/20 1 No

09/16/20 1 Yes

 

… would be counted as 3 events, the new challenge is for it to be counted as 2 events. The 9/13 Yes needs to be considered associated with the 9/9-9/11 run; the 9/16 Yes is still a new event to be counted because two days of No have occurred.

Currently I’m toying with RETAIN as a means of looking back at the previous day’s value but haven’t solved it yet and would welcome the wisdom and skill of others.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1421 views
  • 3 likes
  • 3 in conversation