This question is a new twist on a previous post that was impressively solved by @AMSAS and @PeterClemmensen at
The new scenario I'm tackling is how to perform the same counting method by ID but then 1) attribute the count to a new variable and 2) sum by that variable. In the have and want datasets below I seek to count consecutively-dated Yes values, restart the count after 5 ... and use the LOCATION of the first Yes in the run for summing, even when the location changes at some point in the run.
data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
;
run ;
data want;
input LOCATION $ sum ;
datalines;
A 3
B 5
C 4
Z 2
;
OK, I get it now. This program uses your original data and matches your WANT dataset:
data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;
data _null_;
set have (where=(event='Yes')) end=end_of_have;
by id;
if _n_=1 then do;
declare hash h (ordered:'a');
h.definekey('location');
h.definedata('location','sum');
h.definedone();
end;
if first.id=1 or lag(date)^=date-1 then date_sequence=1;
else date_sequence+1;
if mod(date_sequence,5)=1 then do;
if h.find()^=0 then sum=1;
else sum=sum+1;
h.replace();
end;
if end_of_have;
h.output(dataset:'want');
run;
proc print data=want;run;
So walk me through this: Why is sum=3 at location='A' ?
@Rodcjones wrote:
Hi @PeterClemmensen ,
Location A's total comes from these records:
09/01/20 1 Yes A
+
09/03/20 1 Yes A
+
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
This last group of 5 is counted as 1 (see link to prior post for more on that).
That is NOT consecutive date with Yes, the 09/02/20 is NOT a Yes.
So it appears that you need to define "consecutive" in a bit more detail in relation to this data.
@Rodcjones wrote:
The counting rule I needed to implement says to count consecutively-dated observations of a Yes value as 1 until 5 in a row or a No value, then start over. This counting is done at the ID level. In the case of ID 1 it's correct that the No value on 9/2 interrupts the string of Yes's, and counting restarts at the next Yes for ID 1, which is on 9/3.
So your posted want is incorrect:
data want; input LOCATION $ sum ; datalines; A 3 B 5 C 4 Z 2
Shouldn't C have a count of 5?
---- 10 Lines Not Displayed ----
09/11/20 1 Yes C
---- 1 Line Not Displayed ----
09/13/20 1 Yes C---- 2 Lines Not Displayed ----
09/16/20 1 Yes C---- 3 Lines Not Displayed ----
09/20/20 1 Yes C
09/21/20 1 Yes C---- 6 Lines Not Displayed ----
09/28/20 1 Yes C---- 33 Lines Not Displayed ----
Assuming your answer is Yes, then:
data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;
data _null_;
if 0 then set have;
if _n_=1 then do;
call missing(sum);
declare hash h (ordered:'A');
h.definekey('location');
h.definedata('location','sum');
h.definedone();
end;
do _n=1 to 5 until (last.event);
set have end=end_of_have;
by location event notsorted;
end;
if event='Yes' then do;
if h.find()^=0 then sum=1;
else sum+1;
h.replace();
end;
if end_of_have then h.output(dataset:'want');
run;
C doesn't have a sum of 5 because this row
09/21/20 1 Yes C
is part of a consecutive run. Once the run has gone 5 days for the same ID, the count should reset, but since this is only the 2nd day of Yes for ID 1, it is associated with the preceding record.
I inadvertently seem to have come up with an example and scenario that's tough to follow, so I appreciate the patience and assistance. In this HAVE dataset below I gave a value of 1 to the count_it variable for rows that should be counted. Hopefully that might explain how I came up with the sums in WANT.
data have;
input date :mmddyy8. id $ event $ location $ count_it;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A 1
09/02/20 1 No A 0
09/03/20 1 Yes A 1
09/04/20 1 No A 0
09/05/20 1 Yes B 1
09/06/20 1 No B 0
09/07/20 1 Yes B 1
09/08/20 1 No B 0
09/09/20 1 Yes B 1
09/10/20 1 Yes B 0
09/11/20 1 Yes C 0
09/12/20 1 No C 0
09/13/20 1 Yes C 1
09/14/20 1 No C 0
09/15/20 1 No C 0
09/16/20 1 Yes C 1
09/17/20 1 No C 0
09/18/20 1 No C 0
09/19/20 1 No C 0
09/20/20 1 Yes C 1
09/21/20 1 Yes C 0
09/22/20 1 No C 0
09/23/20 1 No C 0
09/24/20 1 No C 0
09/25/20 1 No C 0
09/26/20 1 No C 0
09/27/20 1 No C 0
09/28/20 1 Yes C 1
09/29/20 1 No C 0
09/30/20 1 No C 0
10/01/20 1 No C 0
10/02/20 1 No C 0
10/03/20 1 No C 0
10/04/20 1 No C 0
10/05/20 1 Yes Z 1
10/06/20 1 No Z 0
10/07/20 1 No Z 0
10/08/20 1 No Z 0
10/09/20 1 Yes Z 1
10/10/20 1 No Z 0
10/11/20 1 No Z 0
10/12/20 1 No Z 0
10/13/20 1 No Z 0
10/14/20 1 No D 0
09/01/20 2 No A 0
09/02/20 2 No A 0
09/03/20 2 No A 0
09/04/20 2 Yes A 1
09/05/20 2 Yes A 0
09/06/20 2 Yes E 0
09/07/20 2 Yes E 0
09/08/20 2 Yes E 0
09/09/20 2 Yes B 1
09/10/20 2 Yes B 0
09/11/20 2 Yes B 0
09/12/20 2 Yes B 0
09/13/20 2 Yes B 0
09/14/20 2 Yes B 1
09/15/20 2 Yes B 0
09/16/20 2 Yes B 0
09/17/20 2 No B 0
;
OK, I get it now. This program uses your original data and matches your WANT dataset:
data have;
input date :mmddyy8. id $ event $ location $;
format date MMDDYY8. ;
datalines;
09/01/20 1 Yes A
09/02/20 1 No A
09/03/20 1 Yes A
09/04/20 1 No A
09/05/20 1 Yes B
09/06/20 1 No B
09/07/20 1 Yes B
09/08/20 1 No B
09/09/20 1 Yes B
09/10/20 1 Yes B
09/11/20 1 Yes C
09/12/20 1 No C
09/13/20 1 Yes C
09/14/20 1 No C
09/15/20 1 No C
09/16/20 1 Yes C
09/17/20 1 No C
09/18/20 1 No C
09/19/20 1 No C
09/20/20 1 Yes C
09/21/20 1 Yes C
09/22/20 1 No C
09/23/20 1 No C
09/24/20 1 No C
09/25/20 1 No C
09/26/20 1 No C
09/27/20 1 No C
09/28/20 1 Yes C
09/29/20 1 No C
09/30/20 1 No C
10/01/20 1 No C
10/02/20 1 No C
10/03/20 1 No C
10/04/20 1 No C
10/05/20 1 Yes Z
10/06/20 1 No Z
10/07/20 1 No Z
10/08/20 1 No Z
10/09/20 1 Yes Z
10/10/20 1 No Z
10/11/20 1 No Z
10/12/20 1 No Z
10/13/20 1 No Z
10/14/20 1 No D
09/01/20 2 No A
09/02/20 2 No A
09/03/20 2 No A
09/04/20 2 Yes A
09/05/20 2 Yes A
09/06/20 2 Yes E
09/07/20 2 Yes E
09/08/20 2 Yes E
09/09/20 2 Yes B
09/10/20 2 Yes B
09/11/20 2 Yes B
09/12/20 2 Yes B
09/13/20 2 Yes B
09/14/20 2 Yes B
09/15/20 2 Yes B
09/16/20 2 Yes B
09/17/20 2 No B
run ;
data _null_;
set have (where=(event='Yes')) end=end_of_have;
by id;
if _n_=1 then do;
declare hash h (ordered:'a');
h.definekey('location');
h.definedata('location','sum');
h.definedone();
end;
if first.id=1 or lag(date)^=date-1 then date_sequence=1;
else date_sequence+1;
if mod(date_sequence,5)=1 then do;
if h.find()^=0 then sum=1;
else sum=sum+1;
h.replace();
end;
if end_of_have;
h.output(dataset:'want');
run;
proc print data=want;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.