This question is a new twist on a previous post that was impressively solved by @AMSAS and @PeterClemmensen at
The new scenario I'm tackling is how to perform the same counting method by ID but then 1) attribute the count to a new variable and 2) sum by that variable. In the have and want datasets below I seek to count consecutively-dated Yes values, restart the count after 5 ... and use the LOCATION of the first Yes in the run for summing, even when the location changes at some point in the run.
data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
;
run ;
data want;
input LOCATION $ sum ;
datalines;
A 3
B 5
C 4
Z 2
;
OK, I get it now. This program uses your original data and matches your WANT dataset:
data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;
data _null_;
set have (where=(event='Yes')) end=end_of_have;
by id;
if _n_=1 then do;
declare hash h (ordered:'a');
h.definekey('location');
h.definedata('location','sum');
h.definedone();
end;
if first.id=1 or lag(date)^=date-1 then date_sequence=1;
else date_sequence+1;
if mod(date_sequence,5)=1 then do;
if h.find()^=0 then sum=1;
else sum=sum+1;
h.replace();
end;
if end_of_have;
h.output(dataset:'want');
run;
proc print data=want;run;
So walk me through this: Why is sum=3 at location='A' ?
@Rodcjones wrote:
Hi @PeterClemmensen ,
Location A's total comes from these records:
09/01/20 1 Yes A
+
09/03/20 1 Yes A
+
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
This last group of 5 is counted as 1 (see link to prior post for more on that).
That is NOT consecutive date with Yes, the 09/02/20 is NOT a Yes.
So it appears that you need to define "consecutive" in a bit more detail in relation to this data.
@Rodcjones wrote:
The counting rule I needed to implement says to count consecutively-dated observations of a Yes value as 1 until 5 in a row or a No value, then start over. This counting is done at the ID level. In the case of ID 1 it's correct that the No value on 9/2 interrupts the string of Yes's, and counting restarts at the next Yes for ID 1, which is on 9/3.
So your posted want is incorrect:
data want; input LOCATION $ sum ; datalines; A 3 B 5 C 4 Z 2
Shouldn't C have a count of 5?
---- 10 Lines Not Displayed ----
09/11/20 1 Yes C
---- 1 Line Not Displayed ----
09/13/20 1 Yes C---- 2 Lines Not Displayed ----
09/16/20 1 Yes C---- 3 Lines Not Displayed ----
09/20/20 1 Yes C
09/21/20 1 Yes C---- 6 Lines Not Displayed ----
09/28/20 1 Yes C---- 33 Lines Not Displayed ----
Assuming your answer is Yes, then:
data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;
data _null_;
if 0 then set have;
if _n_=1 then do;
call missing(sum);
declare hash h (ordered:'A');
h.definekey('location');
h.definedata('location','sum');
h.definedone();
end;
do _n=1 to 5 until (last.event);
set have end=end_of_have;
by location event notsorted;
end;
if event='Yes' then do;
if h.find()^=0 then sum=1;
else sum+1;
h.replace();
end;
if end_of_have then h.output(dataset:'want');
run;
C doesn't have a sum of 5 because this row
09/21/20 1 Yes C
is part of a consecutive run. Once the run has gone 5 days for the same ID, the count should reset, but since this is only the 2nd day of Yes for ID 1, it is associated with the preceding record.
I inadvertently seem to have come up with an example and scenario that's tough to follow, so I appreciate the patience and assistance. In this HAVE dataset below I gave a value of 1 to the count_it variable for rows that should be counted. Hopefully that might explain how I came up with the sums in WANT.
data have;
input date :mmddyy8. id $ event $ location $ count_it;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A 1
09/02/20 1 No A 0
09/03/20 1 Yes A 1
09/04/20 1 No A 0
09/05/20 1 Yes B 1
09/06/20 1 No B 0
09/07/20 1 Yes B 1
09/08/20 1 No B 0
09/09/20 1 Yes B 1
09/10/20 1 Yes B 0
09/11/20 1 Yes C 0
09/12/20 1 No C 0
09/13/20 1 Yes C 1
09/14/20 1 No C 0
09/15/20 1 No C 0
09/16/20 1 Yes C 1
09/17/20 1 No C 0
09/18/20 1 No C 0
09/19/20 1 No C 0
09/20/20 1 Yes C 1
09/21/20 1 Yes C 0
09/22/20 1 No C 0
09/23/20 1 No C 0
09/24/20 1 No C 0
09/25/20 1 No C 0
09/26/20 1 No C 0
09/27/20 1 No C 0
09/28/20 1 Yes C 1
09/29/20 1 No C 0
09/30/20 1 No C 0
10/01/20 1 No C 0
10/02/20 1 No C 0
10/03/20 1 No C 0
10/04/20 1 No C 0
10/05/20 1 Yes Z 1
10/06/20 1 No Z 0
10/07/20 1 No Z 0
10/08/20 1 No Z 0
10/09/20 1 Yes Z 1
10/10/20 1 No Z 0
10/11/20 1 No Z 0
10/12/20 1 No Z 0
10/13/20 1 No Z 0
10/14/20 1 No D 0
09/01/20 2 No A 0
09/02/20 2 No A 0
09/03/20 2 No A 0
09/04/20 2 Yes A 1
09/05/20 2 Yes A 0
09/06/20 2 Yes E 0
09/07/20 2 Yes E 0
09/08/20 2 Yes E 0
09/09/20 2 Yes B 1
09/10/20 2 Yes B 0
09/11/20 2 Yes B 0
09/12/20 2 Yes B 0
09/13/20 2 Yes B 0
09/14/20 2 Yes B 1
09/15/20 2 Yes B 0
09/16/20 2 Yes B 0
09/17/20 2 No B 0
;
OK, I get it now. This program uses your original data and matches your WANT dataset:
data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;
data _null_;
set have (where=(event='Yes')) end=end_of_have;
by id;
if _n_=1 then do;
declare hash h (ordered:'a');
h.definekey('location');
h.definedata('location','sum');
h.definedone();
end;
if first.id=1 or lag(date)^=date-1 then date_sequence=1;
else date_sequence+1;
if mod(date_sequence,5)=1 then do;
if h.find()^=0 then sum=1;
else sum=sum+1;
h.replace();
end;
if end_of_have;
h.output(dataset:'want');
run;
proc print data=want;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.