BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Rodcjones
Obsidian | Level 7

This question is a new twist on a previous post that was impressively solved by  @AMSAS and @PeterClemmensen at 

https://communities.sas.com/t5/SAS-Programming/Counting-consecutively-dated-observations-of-Yes-valu...

The new scenario I'm tackling is how to perform the same counting method by ID but then 1) attribute the count to a new variable and 2) sum by that variable. In the have and want datasets below I seek to count consecutively-dated Yes values, restart the count after 5 ... and use the LOCATION of the first Yes in the run for summing, even when the location changes at some point in the run.

 

data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B

;
run ;

data want;
input LOCATION $ sum ;
datalines;
A 3
B 5
C 4
Z 2
;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

OK, I get it now.  This program uses your original data and matches your WANT dataset:

 

data have;
  input date :mmddyy8. id $ event $ location $;
  format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;

data _null_;
  set have (where=(event='Yes')) end=end_of_have;
  by id;
  if _n_=1 then do;
    declare hash h (ordered:'a');
	  h.definekey('location');
	  h.definedata('location','sum');
	  h.definedone();
  end;

  if first.id=1 or lag(date)^=date-1 then date_sequence=1;
  else date_sequence+1;
  if mod(date_sequence,5)=1 then do;
    if h.find()^=0 then sum=1;
	else sum=sum+1;
	h.replace();
  end;

  if end_of_have;
  h.output(dataset:'want');
run;

proc print data=want;run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
Rodcjones
Obsidian | Level 7
Hi @PeterClemmensen ,
Location A's total comes from these records:
09/01/20 1 Yes A
+
09/03/20 1 Yes A
+
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E

This last group of 5 is counted as 1 (see link to prior post for more on that).
ballardw
Super User

@Rodcjones wrote:
Hi @PeterClemmensen ,
Location A's total comes from these records:
09/01/20 1 Yes A
+
09/03/20 1 Yes A
+
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E

This last group of 5 is counted as 1 (see link to prior post for more on that).

That is NOT consecutive date with Yes, the 09/02/20 is NOT a Yes.

So it appears that you need to define "consecutive" in a bit more detail in relation to this data.

Rodcjones
Obsidian | Level 7
The counting rule I needed to implement says to count consecutively-dated observations of a Yes value as 1 until 5 in a row or a No value, then start over. This counting is done at the ID level. In the case of ID 1 it's correct that the No value on 9/2 interrupts the string of Yes's, and counting restarts at the next Yes for ID 1, which is on 9/3.
ballardw
Super User

@Rodcjones wrote:
The counting rule I needed to implement says to count consecutively-dated observations of a Yes value as 1 until 5 in a row or a No value, then start over. This counting is done at the ID level. In the case of ID 1 it's correct that the No value on 9/2 interrupts the string of Yes's, and counting restarts at the next Yes for ID 1, which is on 9/3.

So your posted want is incorrect:

data want;
input LOCATION $ sum ;
datalines;
A 3
B 5
C 4
Z 2
mkeintz
PROC Star

Shouldn't C have a count of 5?

 

---- 10 Lines Not Displayed ----

09/11/20 1 Yes C

---- 1 Line Not Displayed ----
09/13/20 1 Yes C

---- 2 Lines Not Displayed ----
09/16/20 1 Yes C

---- 3 Lines Not Displayed ----
09/20/20 1 Yes C
09/21/20 1 Yes C

---- 6 Lines Not Displayed ----
09/28/20 1 Yes C 

---- 33 Lines Not Displayed ----

Assuming your answer is Yes, then:

 

data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;

data _null_;
  if 0 then set have;
  if _n_=1 then do;
    call missing(sum);
    declare hash h (ordered:'A');
	  h.definekey('location');
	  h.definedata('location','sum');
	  h.definedone();
  end;

  do _n=1 to 5 until (last.event);
    set have end=end_of_have;
	by location event notsorted;
  end;

  if event='Yes' then do;
    if h.find()^=0 then sum=1;
	else sum+1;
	h.replace();
  end;
  if end_of_have then h.output(dataset:'want');

run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Rodcjones
Obsidian | Level 7

C doesn't have a sum of 5 because this row

09/21/20 1 Yes C

is part of a consecutive run. Once the run has gone 5 days for the same ID, the count should reset, but since this is only the 2nd day of Yes for ID 1, it is associated with the preceding record.

 

I inadvertently seem to have come up with an example and scenario that's tough to follow, so I appreciate the patience and assistance. In this HAVE dataset below I gave a value of 1 to the count_it variable for rows that should be counted. Hopefully that might explain how I came up with the sums in WANT.

 

data have;
input date :mmddyy8. id $ event $ location $ count_it;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A 1
09/02/20 1 No A 0
09/03/20 1 Yes A 1
09/04/20 1 No A 0
09/05/20 1 Yes B 1
09/06/20 1 No B 0
09/07/20 1 Yes B 1
09/08/20 1 No B 0
09/09/20 1 Yes B 1
09/10/20 1 Yes B 0
09/11/20 1 Yes C 0
09/12/20 1 No C 0
09/13/20 1 Yes C 1
09/14/20 1 No C 0
09/15/20 1 No C 0
09/16/20 1 Yes C 1
09/17/20 1 No C 0
09/18/20 1 No C 0
09/19/20 1 No C 0
09/20/20 1 Yes C 1
09/21/20 1 Yes C 0
09/22/20 1 No C 0
09/23/20 1 No C 0
09/24/20 1 No C 0
09/25/20 1 No C 0
09/26/20 1 No C 0
09/27/20 1 No C 0
09/28/20 1 Yes C 1
09/29/20 1 No C 0
09/30/20 1 No C 0
10/01/20 1 No C 0
10/02/20 1 No C 0
10/03/20 1 No C 0
10/04/20 1 No C 0
10/05/20 1 Yes Z 1
10/06/20 1 No Z 0
10/07/20 1 No Z 0
10/08/20 1 No Z 0
10/09/20 1 Yes Z 1
10/10/20 1 No Z 0
10/11/20 1 No Z 0
10/12/20 1 No Z 0
10/13/20 1 No Z 0
10/14/20 1 No D 0
09/01/20 2 No A 0
09/02/20 2 No A 0
09/03/20 2 No A 0
09/04/20 2 Yes A 1
09/05/20 2 Yes A 0
09/06/20 2 Yes E 0
09/07/20 2 Yes E 0
09/08/20 2 Yes E 0
09/09/20 2 Yes B 1
09/10/20 2 Yes B 0
09/11/20 2 Yes B 0
09/12/20 2 Yes B 0
09/13/20 2 Yes B 0
09/14/20 2 Yes B 1
09/15/20 2 Yes B 0
09/16/20 2 Yes B 0
09/17/20 2 No B 0
;

mkeintz
PROC Star

OK, I get it now.  This program uses your original data and matches your WANT dataset:

 

data have;
  input date :mmddyy8. id $ event $ location $;
  format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;

data _null_;
  set have (where=(event='Yes')) end=end_of_have;
  by id;
  if _n_=1 then do;
    declare hash h (ordered:'a');
	  h.definekey('location');
	  h.definedata('location','sum');
	  h.definedone();
  end;

  if first.id=1 or lag(date)^=date-1 then date_sequence=1;
  else date_sequence+1;
  if mod(date_sequence,5)=1 then do;
    if h.find()^=0 then sum=1;
	else sum=sum+1;
	h.replace();
  end;

  if end_of_have;
  h.output(dataset:'want');
run;

proc print data=want;run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Rodcjones
Obsidian | Level 7
This is an elegant solution, thank you @mkeintz!
For my growth, I was wondering if you have any good references for what is happening in the data _null_ step with the hash stuff. That seems powerful but I don't follow it - and don't recognize the concepts.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2068 views
  • 0 likes
  • 4 in conversation