Hi everyone,
I am hoping someone can help me! I am working with data that have multiple observations per person, each with a different reason for the observation. The pattern must start with a reason of 1 or 3, have one or more observations with a 4 or 5, and end with an observation of a 1 or 3. The number of observations within a pattern can vary and this pattern may occur multiple times for one person. I have the end of the pattern I am looking for flagged but I now need to flag the start of the pattern I am looking for as well as populate that observation with the number of 4/5's that follow prior to the pattern ending 1 or 3 observation. I have put together some fake numbers in an effort to help explain what I am looking for. I am using SAS 9.4.
LocationID | PersonID | Date | Reason | Cycle_end(have) | Cycle_start(need) | Count(need) |
1 | 1 | 1/2/2014 | 01 | 1 | 3 | |
1 | 1 | 1/3/2014 | 04 | |||
1 | 1 | 1/4/2014 | 05 | |||
1 | 1 | 1/5/2014 | 04 | |||
1 | 1 | 1/6/2014 | 01 | 1 | ||
1 | 1 | 1/7/2014 | 03 | 1 | 3 | |
1 | 1 | 1/8/2014 | 04 | |||
1 | 1 | 1/9/2014 | 05 | |||
1 | 1 | 1/10/2014 | 04 | |||
1 | 1 | 1/11/2014 | 01 | 1 | ||
1 | 1 | 1/12/2014 | 09 | |||
1 | 2 | 1/13/2014 | 01 | |||
1 | 2 | 1/14/2014 | 09 | |||
1 | 2 | 1/15/2014 | 01 | 1 | 2 | |
1 | 2 | 1/16/2014 | 04 | |||
1 | 2 | 1/17/2014 | 04 | |||
1 | 2 | 1/18/2014 | 01 | 1 | ||
1 | 2 | 1/19/2014 | 01 | 1 | 2 | |
1 | 2 | 1/20/2014 | 04 | |||
1 | 2 | 1/21/2014 | 05 | |||
1 | 2 | 1/22/2014 | 03 | 1 | ||
2 | 3 | 1/23/2014 | 01 | |||
2 | 3 | 1/24/2014 | 04 | |||
2 | 3 | 1/25/2014 | 04 | |||
2 | 3 | 1/26/2014 | 09 | |||
2 | 3 | 1/27/2014 | 01 | 1 | 2 | |
2 | 3 | 1/28/2014 | 04 | |||
2 | 3 | 1/29/2014 | 05 | |||
2 | 3 | 1/30/2014 | 03 | 1 | ||
2 | 4 | 1/31/2014 | 01 | |||
2 | 4 | 2/1/2014 | 04 | |||
2 | 4 | 2/2/2014 | 04 | |||
3 | 5 | 2/3/2014 | 04 | |||
3 | 5 | 2/4/2014 | 04 | |||
3 | 5 | 2/5/2014 | 01 | 1 | 3 | |
3 | 5 | 2/6/2014 | 04 | |||
3 | 5 | 2/7/2014 | 04 | |||
3 | 5 | 2/8/2014 | 04 | |||
3 | 5 | 2/9/2014 | 03 | 1 | ||
3 | 6 | 2/10/2014 | 06 | |||
3 | 6 | 2/11/2014 | 08 |
Thanks for any help!!
Sorry so late. I don't know why I can't see your message (your message would not be able to highlight). OK. Here is. data have; infile cards expandtabs truncover; input LocationID PersonID Date : $20. Reason ; cards; 1 1 1/11/2014 01 1 1 1/12/2014 09 1 1 1/13/2014 01 1 1 1/14/2014 06 1 1 1/15/2014 03 1 1 1/16/2014 04 1 1 1/17/2014 03 1 1 1/18/2014 04 1 1 1/19/2014 04 1 1 1/20/2014 03 1 1 1/2/2014 01 1 1 1/3/2014 04 1 1 1/4/2014 05 1 1 1/5/2014 04 1 1 1/6/2014 01 1 1 1/7/2014 03 1 1 1/8/2014 04 1 1 1/9/2014 05 1 1 1/10/2014 04 1 1 1/11/2014 01 1 1 1/12/2014 09 1 2 1/13/2014 01 1 2 1/14/2014 09 1 2 1/15/2014 01 1 2 1/16/2014 04 1 2 1/17/2014 04 1 2 1/18/2014 01 1 2 1/19/2014 01 1 2 1/20/2014 04 1 2 1/21/2014 05 1 2 1/22/2014 03 2 3 1/23/2014 01 2 3 1/24/2014 04 2 3 1/25/2014 04 2 3 1/26/2014 09 2 3 1/27/2014 01 2 3 1/28/2014 04 2 3 1/29/2014 05 2 3 1/30/2014 03 2 4 1/31/2014 01 2 4 2/1/2014 04 2 4 2/2/2014 04 3 5 2/3/2014 04 3 5 2/4/2014 04 3 5 2/5/2014 01 3 5 2/6/2014 04 3 5 2/7/2014 04 3 5 2/8/2014 04 3 5 2/9/2014 03 3 6 2/10/2014 06 3 6 2/11/2014 08 ; run; data want; array x{999999} _temporary_; call missing(of x{*}); do k=1 by 1 until(last.PersonID); set have; by PersonID; x{k}=Reason; end; do j=1 by 1 until(last.PersonID); set have; by PersonID; if Reason in (1 3) and j ne _idx then do; n=0; do i=j+1 to k; n+1; if x{i} not in (4 5) then leave; end; if n ne 1 and x{i} in (1 3) then do; Cycle_start=1; Count=n-1; _idx=i; end; end; if j=_idx then Cycle_end=1; output; call missing(Cycle_start,Cycle_end,Count) ; end; drop j _idx n k i ; run;
Here's as far as I've made it. Note that it is abending due to a program statement, where your data doesn't seem to match the pattern.
Tom
proc sort data=Have;
by PersonID Date;
run;
data Cycles;
set Have;
by PersonID;
retain CycleStarted CycleCount CycleStart CycleEnd;
if first.PersonID
then do;
CycleStarted = 0;
CycleCount = 0;
CycleStart = .;
CycleEnd = .;
end;
if Reason = 9 then /* Assume we ignore 9s? */
return;
if ^CycleStarted
then do;
if Reason in(1, 3)
then do;
CycleStart = Reason;
CycleStarted = 1;
end;
else abort; /* Cycle doesn't start with 1 or 5 */
end;
else do;
/* We are in a cycle */
if Reason in(1, 3) | last.PersonID
then do;
CycleEnd = Reason;
output;
CycleStarted = 0;
CycleCount = 0;
CycleStart = .;
CycleEnd = .;
end;
else if reason in(4, 5) then CycleCount = CycleCount + 1;
else;
end;
run;
I like this kind of question. data have; infile cards expandtabs truncover; input LocationID PersonID Date : $20. Reason ; cards; 1 1 1/2/2014 01 1 1 1/3/2014 04 1 1 1/4/2014 05 1 1 1/5/2014 04 1 1 1/6/2014 01 1 1 1/7/2014 03 1 1 1/8/2014 04 1 1 1/9/2014 05 1 1 1/10/2014 04 1 1 1/11/2014 01 1 1 1/12/2014 09 1 2 1/13/2014 01 1 2 1/14/2014 09 1 2 1/15/2014 01 1 2 1/16/2014 04 1 2 1/17/2014 04 1 2 1/18/2014 01 1 2 1/19/2014 01 1 2 1/20/2014 04 1 2 1/21/2014 05 1 2 1/22/2014 03 2 3 1/23/2014 01 2 3 1/24/2014 04 2 3 1/25/2014 04 2 3 1/26/2014 09 2 3 1/27/2014 01 2 3 1/28/2014 04 2 3 1/29/2014 05 2 3 1/30/2014 03 2 4 1/31/2014 01 2 4 2/1/2014 04 2 4 2/2/2014 04 3 5 2/3/2014 04 3 5 2/4/2014 04 3 5 2/5/2014 01 3 5 2/6/2014 04 3 5 2/7/2014 04 3 5 2/8/2014 04 3 5 2/9/2014 03 3 6 2/10/2014 06 3 6 2/11/2014 08 ; run; data temp; set have; by PersonID; if first.PersonID then idx=0; idx+1; if Reason in (1 3) then New_Reason=9999999; else if Reason in (4 5) then New_Reason=8888888; else New_Reason=Reason; run; data want; array x{9999999} _temporary_; call missing(of x{*}); k=0; do until(last.PersonID); set temp; by PersonID; k+1; x{k}=New_Reason; end; do until(last.PersonID); set temp; by PersonID; if New_Reason=9999999 then do; n=0; do i=idx+1 to dim(x); n+1; if x{i} ne 8888888 then leave; end; if n ne 1 and x{i}=9999999 then do; Cycle_start=1; Count=n-1; _idx=i; end; end; if idx=_idx then Cycle_end=1; output; call missing(Cycle_start,Cycle_end,Count) ; end; drop idx _idx n k i New_Reason; run;
Thank you!! This code does work on my mini dataset but I know that my larger dataset has patterns that the small dataset doesn't capture. I am going to run this code on the large dataset (@20 million obs) tomorrow morning and I will update then.
OK.This code might give you a little fast .
data have;
infile cards expandtabs truncover;
input LocationID PersonID Date : $20. Reason ;
cards;
1 1 1/2/2014 01
1 1 1/3/2014 04
1 1 1/4/2014 05
1 1 1/5/2014 04
1 1 1/6/2014 01
1 1 1/7/2014 03
1 1 1/8/2014 04
1 1 1/9/2014 05
1 1 1/10/2014 04
1 1 1/11/2014 01
1 1 1/12/2014 09
1 2 1/13/2014 01
1 2 1/14/2014 09
1 2 1/15/2014 01
1 2 1/16/2014 04
1 2 1/17/2014 04
1 2 1/18/2014 01
1 2 1/19/2014 01
1 2 1/20/2014 04
1 2 1/21/2014 05
1 2 1/22/2014 03
2 3 1/23/2014 01
2 3 1/24/2014 04
2 3 1/25/2014 04
2 3 1/26/2014 09
2 3 1/27/2014 01
2 3 1/28/2014 04
2 3 1/29/2014 05
2 3 1/30/2014 03
2 4 1/31/2014 01
2 4 2/1/2014 04
2 4 2/2/2014 04
3 5 2/3/2014 04
3 5 2/4/2014 04
3 5 2/5/2014 01
3 5 2/6/2014 04
3 5 2/7/2014 04
3 5 2/8/2014 04
3 5 2/9/2014 03
3 6 2/10/2014 06
3 6 2/11/2014 08
;
run;
data temp;
set have;
by PersonID;
if first.PersonID then idx=0;
idx+1;
run;
data want;
array x{999999} _temporary_;
call missing(of x{*});
k=0;
do until(last.PersonID);
set temp;
by PersonID;
k+1;
x{k}=Reason;
end;
do until(last.PersonID);
set temp;
by PersonID;
if Reason in (1 3) then do;
n=0;
do i=idx+1 to k;
n+1;
if x{i} not in (4 5) then leave;
end;
if n ne 1 and x{i} in (1 3) then do;
Cycle_start=1;
Count=n-1;
_idx=i;
end;
end;
if idx=_idx then Cycle_end=1;
output;
call missing(Cycle_start,Cycle_end,Count) ;
end;
drop idx _idx n k i ;
run;
This version is very close to working! There is one pattern that is not being accounted for with this code. When a person has multiple 01/03 it is assigning two cycle starts and one end after the second cycle start instead of one start and one end. I pasted an example of what I am referring to below. The current cycle start/end is what the code is doing and the correct cycle start/end is what should be captured. The end of one cycle may not be the begining of another cycle. Any ideas on how to modify the code to address these situations? Also any other ideas to make it faster would be great, it did take a while to run.
Thanks again!!
LocationID | PersonID | Date | Reason | Current cycle start | Current cycle end | Correct cycle start | Correct cycle end |
1 | 1 | 1/11/2014 | 01 | ||||
1 | 1 | 1/12/2014 | 09 | ||||
1 | 1 | 1/13/2014 | 01 | ||||
1 | 1 | 1/14/2014 | 06 | ||||
1 | 1 | 1/15/2014 | 03 | 1 | 1 | ||
1 | 1 | 1/16/2014 | 04 | ||||
1 | 1 | 1/17/2014 | 03 | 1 | 1 | ||
1 | 1 | 1/18/2014 | 04 | ||||
1 | 1 | 1/19/2014 | 04 | ||||
1 | 1 | 1/20/2014 | 03 | 1 |
Sorry so late. I don't know why I can't see your message (your message would not be able to highlight). OK. Here is. data have; infile cards expandtabs truncover; input LocationID PersonID Date : $20. Reason ; cards; 1 1 1/11/2014 01 1 1 1/12/2014 09 1 1 1/13/2014 01 1 1 1/14/2014 06 1 1 1/15/2014 03 1 1 1/16/2014 04 1 1 1/17/2014 03 1 1 1/18/2014 04 1 1 1/19/2014 04 1 1 1/20/2014 03 1 1 1/2/2014 01 1 1 1/3/2014 04 1 1 1/4/2014 05 1 1 1/5/2014 04 1 1 1/6/2014 01 1 1 1/7/2014 03 1 1 1/8/2014 04 1 1 1/9/2014 05 1 1 1/10/2014 04 1 1 1/11/2014 01 1 1 1/12/2014 09 1 2 1/13/2014 01 1 2 1/14/2014 09 1 2 1/15/2014 01 1 2 1/16/2014 04 1 2 1/17/2014 04 1 2 1/18/2014 01 1 2 1/19/2014 01 1 2 1/20/2014 04 1 2 1/21/2014 05 1 2 1/22/2014 03 2 3 1/23/2014 01 2 3 1/24/2014 04 2 3 1/25/2014 04 2 3 1/26/2014 09 2 3 1/27/2014 01 2 3 1/28/2014 04 2 3 1/29/2014 05 2 3 1/30/2014 03 2 4 1/31/2014 01 2 4 2/1/2014 04 2 4 2/2/2014 04 3 5 2/3/2014 04 3 5 2/4/2014 04 3 5 2/5/2014 01 3 5 2/6/2014 04 3 5 2/7/2014 04 3 5 2/8/2014 04 3 5 2/9/2014 03 3 6 2/10/2014 06 3 6 2/11/2014 08 ; run; data want; array x{999999} _temporary_; call missing(of x{*}); do k=1 by 1 until(last.PersonID); set have; by PersonID; x{k}=Reason; end; do j=1 by 1 until(last.PersonID); set have; by PersonID; if Reason in (1 3) and j ne _idx then do; n=0; do i=j+1 to k; n+1; if x{i} not in (4 5) then leave; end; if n ne 1 and x{i} in (1 3) then do; Cycle_start=1; Count=n-1; _idx=i; end; end; if j=_idx then Cycle_end=1; output; call missing(Cycle_start,Cycle_end,Count) ; end; drop j _idx n k i ; run;
Thank you, this worked perfectly and took just under 4 hours to run!
Here is my try:
data temp;
set have; by LocationID PersonID;
if first.PersonId then personCycle=0;
if reason in (1, 3) then do;
personCycle + 1;
if mod(personCycle, 2) = 1 then cycleStart = 1;
else cycleEnd = 1;
end;
cycle45Count + reason in (4, 5);
run;
proc sql;
create table want as
select LocationID, PersonID, Date, Reason, cycleStart, cycleEnd,
case when cycleStart and range(cycle45count)>0
then range(cycle45count)
else . end as cycleCount
from temp
group by LocationID, PersonID, personCycle
order by LocationID, PersonID, date;
select * from want;
quit;
Here is a more efficient version for larger datasets:
data temp;
set have; by LocationID PersonID;
if first.PersonId then personCycle=0;
if reason in (1, 3) then do;
personCycle + 1;
if mod(personCycle, 2) = 1 then cycleStart = 1;
else cycleEnd = 1;
end;
cycle45Count + reason in (4, 5);
run;
data want;
do until(last.personCycle);
set temp; by LocationID PersonID personCycle;
if last.personCycle then countEnd = cycle45count;
end;
do until(last.personCycle);
set temp; by LocationID PersonID personCycle;
if cycleStart and cycle45count < countEnd
then cycleCount = countEnd - cycle45count;
output;
call missing(cycleCount);
end;
drop personCycle cycle45Count countEnd;
run;
proc print data=want noobs; run;
Thanks for the try! This code is also identifying cycles that have other reasons in between (like 1 followed by a 9 or a 3 followed by a 1) and it is not requiring a cycle end before identifying another start.
I assumed that a change in person_id would signal the end of information for the current cycle. For example, when going from person 2 to 3 on 23JAN2014, the cycle started for person 2 on 18JAN2014 hasn't been closed. Do you want to assume that the same cycle started under person 2 continues for person 3?
No, but a cycle start shouldn't be indicated if that person doesn't also have an end within this data. So if they didn't have a 1 or 3 also followed by a 4 or 5 also followed by a 1 or 3 then for this purpose they should not have a cycle start. Hopefully that makes more sense?
So why is 23JAN2014 for person 3 not the start of a cycle ending on 27JAN2014 with a count of 2?
That is not the start of a cycle because the 26JAN2014 for person 3 has a reason of 9. It should only count as a cycle if it started with 1/3, had one or more 4/5's, and ended in a 1/3 with no other reason types within the cycle. Anytime there is a different reason type the prior 1/3 and 4/5's are no longer eligible to be part of a cycle.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.