BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Rodcjones
Obsidian | Level 7

This question is a new twist on a previous post that was impressively solved by  @AMSAS and @PeterClemmensen at 

https://communities.sas.com/t5/SAS-Programming/Counting-consecutively-dated-observations-of-Yes-valu...

The new scenario I'm tackling is how to perform the same counting method by ID but then 1) attribute the count to a new variable and 2) sum by that variable. In the have and want datasets below I seek to count consecutively-dated Yes values, restart the count after 5 ... and use the LOCATION of the first Yes in the run for summing, even when the location changes at some point in the run.

 

data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B

;
run ;

data want;
input LOCATION $ sum ;
datalines;
A 3
B 5
C 4
Z 2
;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

OK, I get it now.  This program uses your original data and matches your WANT dataset:

 

data have;
  input date :mmddyy8. id $ event $ location $;
  format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;

data _null_;
  set have (where=(event='Yes')) end=end_of_have;
  by id;
  if _n_=1 then do;
    declare hash h (ordered:'a');
	  h.definekey('location');
	  h.definedata('location','sum');
	  h.definedone();
  end;

  if first.id=1 or lag(date)^=date-1 then date_sequence=1;
  else date_sequence+1;
  if mod(date_sequence,5)=1 then do;
    if h.find()^=0 then sum=1;
	else sum=sum+1;
	h.replace();
  end;

  if end_of_have;
  h.output(dataset:'want');
run;

proc print data=want;run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
Rodcjones
Obsidian | Level 7
Hi @PeterClemmensen ,
Location A's total comes from these records:
09/01/20 1 Yes A
+
09/03/20 1 Yes A
+
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E

This last group of 5 is counted as 1 (see link to prior post for more on that).
ballardw
Super User

@Rodcjones wrote:
Hi @PeterClemmensen ,
Location A's total comes from these records:
09/01/20 1 Yes A
+
09/03/20 1 Yes A
+
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E

This last group of 5 is counted as 1 (see link to prior post for more on that).

That is NOT consecutive date with Yes, the 09/02/20 is NOT a Yes.

So it appears that you need to define "consecutive" in a bit more detail in relation to this data.

Rodcjones
Obsidian | Level 7
The counting rule I needed to implement says to count consecutively-dated observations of a Yes value as 1 until 5 in a row or a No value, then start over. This counting is done at the ID level. In the case of ID 1 it's correct that the No value on 9/2 interrupts the string of Yes's, and counting restarts at the next Yes for ID 1, which is on 9/3.
ballardw
Super User

@Rodcjones wrote:
The counting rule I needed to implement says to count consecutively-dated observations of a Yes value as 1 until 5 in a row or a No value, then start over. This counting is done at the ID level. In the case of ID 1 it's correct that the No value on 9/2 interrupts the string of Yes's, and counting restarts at the next Yes for ID 1, which is on 9/3.

So your posted want is incorrect:

data want;
input LOCATION $ sum ;
datalines;
A 3
B 5
C 4
Z 2
mkeintz
PROC Star

Shouldn't C have a count of 5?

 

---- 10 Lines Not Displayed ----

09/11/20 1 Yes C

---- 1 Line Not Displayed ----
09/13/20 1 Yes C

---- 2 Lines Not Displayed ----
09/16/20 1 Yes C

---- 3 Lines Not Displayed ----
09/20/20 1 Yes C
09/21/20 1 Yes C

---- 6 Lines Not Displayed ----
09/28/20 1 Yes C 

---- 33 Lines Not Displayed ----

Assuming your answer is Yes, then:

 

data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;

data _null_;
  if 0 then set have;
  if _n_=1 then do;
    call missing(sum);
    declare hash h (ordered:'A');
	  h.definekey('location');
	  h.definedata('location','sum');
	  h.definedone();
  end;

  do _n=1 to 5 until (last.event);
    set have end=end_of_have;
	by location event notsorted;
  end;

  if event='Yes' then do;
    if h.find()^=0 then sum=1;
	else sum+1;
	h.replace();
  end;
  if end_of_have then h.output(dataset:'want');

run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Rodcjones
Obsidian | Level 7

C doesn't have a sum of 5 because this row

09/21/20 1 Yes C

is part of a consecutive run. Once the run has gone 5 days for the same ID, the count should reset, but since this is only the 2nd day of Yes for ID 1, it is associated with the preceding record.

 

I inadvertently seem to have come up with an example and scenario that's tough to follow, so I appreciate the patience and assistance. In this HAVE dataset below I gave a value of 1 to the count_it variable for rows that should be counted. Hopefully that might explain how I came up with the sums in WANT.

 

data have;
input date :mmddyy8. id $ event $ location $ count_it;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A 1
09/02/20 1 No A 0
09/03/20 1 Yes A 1
09/04/20 1 No A 0
09/05/20 1 Yes B 1
09/06/20 1 No B 0
09/07/20 1 Yes B 1
09/08/20 1 No B 0
09/09/20 1 Yes B 1
09/10/20 1 Yes B 0
09/11/20 1 Yes C 0
09/12/20 1 No C 0
09/13/20 1 Yes C 1
09/14/20 1 No C 0
09/15/20 1 No C 0
09/16/20 1 Yes C 1
09/17/20 1 No C 0
09/18/20 1 No C 0
09/19/20 1 No C 0
09/20/20 1 Yes C 1
09/21/20 1 Yes C 0
09/22/20 1 No C 0
09/23/20 1 No C 0
09/24/20 1 No C 0
09/25/20 1 No C 0
09/26/20 1 No C 0
09/27/20 1 No C 0
09/28/20 1 Yes C 1
09/29/20 1 No C 0
09/30/20 1 No C 0
10/01/20 1 No C 0
10/02/20 1 No C 0
10/03/20 1 No C 0
10/04/20 1 No C 0
10/05/20 1 Yes Z 1
10/06/20 1 No Z 0
10/07/20 1 No Z 0
10/08/20 1 No Z 0
10/09/20 1 Yes Z 1
10/10/20 1 No Z 0
10/11/20 1 No Z 0
10/12/20 1 No Z 0
10/13/20 1 No Z 0
10/14/20 1 No D 0
09/01/20 2 No A 0
09/02/20 2 No A 0
09/03/20 2 No A 0
09/04/20 2 Yes A 1
09/05/20 2 Yes A 0
09/06/20 2 Yes E 0
09/07/20 2 Yes E 0
09/08/20 2 Yes E 0
09/09/20 2 Yes B 1
09/10/20 2 Yes B 0
09/11/20 2 Yes B 0
09/12/20 2 Yes B 0
09/13/20 2 Yes B 0
09/14/20 2 Yes B 1
09/15/20 2 Yes B 0
09/16/20 2 Yes B 0
09/17/20 2 No B 0
;

mkeintz
PROC Star

OK, I get it now.  This program uses your original data and matches your WANT dataset:

 

data have;
  input date :mmddyy8. id $ event $ location $;
  format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;

data _null_;
  set have (where=(event='Yes')) end=end_of_have;
  by id;
  if _n_=1 then do;
    declare hash h (ordered:'a');
	  h.definekey('location');
	  h.definedata('location','sum');
	  h.definedone();
  end;

  if first.id=1 or lag(date)^=date-1 then date_sequence=1;
  else date_sequence+1;
  if mod(date_sequence,5)=1 then do;
    if h.find()^=0 then sum=1;
	else sum=sum+1;
	h.replace();
  end;

  if end_of_have;
  h.output(dataset:'want');
run;

proc print data=want;run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Rodcjones
Obsidian | Level 7
This is an elegant solution, thank you @mkeintz!
For my growth, I was wondering if you have any good references for what is happening in the data _null_ step with the hash stuff. That seems powerful but I don't follow it - and don't recognize the concepts.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1242 views
  • 0 likes
  • 4 in conversation