SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Flagging the start of a pattern across observations within a group

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Flagging the start of a pattern across observations within a group

Hi everyone,

 

I am hoping someone can help me! I am working with data that have multiple observations per person, each with a different reason for the observation. The pattern must start with a reason of 1 or 3, have one or more observations with a 4 or 5, and end with an observation of a 1 or 3. The number of observations within a pattern can vary and this pattern may occur multiple times for one person. I have the end of the pattern I am looking for flagged but I now need to flag the start of the pattern I am looking for as well as populate that observation with the number of 4/5's that follow prior to the pattern ending 1 or 3 observation. I have put together some fake numbers in an effort to help explain what I am looking for. I am using SAS 9.4.

 

LocationIDPersonIDDateReasonCycle_end(have)Cycle_start(need)Count(need)
111/2/201401 13
111/3/201404   
111/4/201405   
111/5/201404   
111/6/2014011  
111/7/201403 13
111/8/201404   
111/9/201405   
111/10/201404   
111/11/2014011  
111/12/201409   
121/13/201401   
121/14/201409   
121/15/201401 12
121/16/201404   
121/17/201404   
121/18/2014011  
121/19/201401 12
121/20/201404   
121/21/201405   
121/22/2014031  
231/23/201401   
231/24/201404   
231/25/201404   
231/26/201409   
231/27/201401 12
231/28/201404   
231/29/201405   
231/30/2014031  
241/31/201401   
242/1/201404   
242/2/201404   
352/3/201404   
352/4/201404   
352/5/201401 13
352/6/201404   
352/7/201404   
352/8/201404   
352/9/2014031  
362/10/201406   
362/11/201408   


Thanks for any help!! 


Accepted Solutions
Solution
‎05-27-2016 10:32 AM
Super User
Posts: 9,682

Re: Flagging the start of a pattern across observations within a group

Sorry so late. I don't know why I can't see your message (your message would not be able to highlight).
OK. Here is.




data have;
infile cards expandtabs truncover;
input LocationID	PersonID	Date : $20.	Reason	;
cards;
1	1	1/11/2014	01	 	 	 	 
1	1	1/12/2014	09	 	 	 	 
1	1	1/13/2014	01	 	 	 	 
1	1	1/14/2014	06	 	 	 	 
1	1	1/15/2014	03 
1	1	1/16/2014	04	 	 	 	 
1	1	1/17/2014	03
1	1	1/18/2014	04	 	 	 	 
1	1	1/19/2014	04	 	 	 	 
1	1	1/20/2014	03
1	1	1/2/2014	01 
1	1	1/3/2014	04	 	 	 
1	1	1/4/2014	05	 	 	 
1	1	1/5/2014	04	 	 	 
1	1	1/6/2014	01		 	 
1	1	1/7/2014	03	 		
1	1	1/8/2014	04	 	 	 
1	1	1/9/2014	05	 	 	 
1	1	1/10/2014	04	 	 	 
1	1	1/11/2014	01		 	 
1	1	1/12/2014	09	 	 	 
1	2	1/13/2014	01	 	 	 
1	2	1/14/2014	09	 	 	 
1	2	1/15/2014	01	 		
1	2	1/16/2014	04	 	 	 
1	2	1/17/2014	04	 	 	 
1	2	1/18/2014	01		 	 
1	2	1/19/2014	01	 		
1	2	1/20/2014	04	 	 	 
1	2	1/21/2014	05	 	 	 
1	2	1/22/2014	03		 	 
2	3	1/23/2014	01	 	 	 
2	3	1/24/2014	04	 	 	 
2	3	1/25/2014	04	 	 	 
2	3	1/26/2014	09	 	 	 
2	3	1/27/2014	01	 	
2	3	1/28/2014	04	 	 	 
2	3	1/29/2014	05	 	 	 
2	3	1/30/2014	03		 	 
2	4	1/31/2014	01	 	 	 
2	4	2/1/2014	04	 	 	 
2	4	2/2/2014	04	 	 	 
3	5	2/3/2014	04	 	 	 
3	5	2/4/2014	04	 	 	 
3	5	2/5/2014	01	 
3	5	2/6/2014	04	 	 	 
3	5	2/7/2014	04	 	 	 
3	5	2/8/2014	04	 	 	 
3	5	2/9/2014	03		 	 
3	6	2/10/2014	06	 	 	 
3	6	2/11/2014	08
;
run;


data want;
array x{999999} _temporary_;
call missing(of x{*});

do k=1 by 1 until(last.PersonID);
 set have;
 by PersonID;
 x{k}=Reason;
end;

do j=1 by 1 until(last.PersonID);
 set have;
 by PersonID;
 if Reason in (1 3) and j ne _idx then do;
  n=0;
  do i=j+1 to k;
   n+1;
   if x{i} not in (4 5) then leave;
  end;
  
  if n ne 1 and  x{i} in (1 3) then do;
   Cycle_start=1;
   Count=n-1;
   _idx=i;
  end;
 end;
 
 if j=_idx then Cycle_end=1;
 output;
 call missing(Cycle_start,Cycle_end,Count) ;
end;

drop j _idx n k i ;
run;


View solution in original post


All Replies
PROC Star
Posts: 1,095

Re: Flagging the start of a pattern across observations within a group

Here's as far as I've made it. Note that it is abending due to a program statement, where your data doesn't seem to match the pattern.

 

Tom

 

proc sort data=Have;

by PersonID Date;

run;

data Cycles;

set Have;

by PersonID;

retain CycleStarted CycleCount CycleStart CycleEnd;

if first.PersonID

then do;

CycleStarted = 0;

CycleCount = 0;

CycleStart = .;

CycleEnd = .;

end;

if Reason = 9 then /* Assume we ignore 9s? */

return;

if ^CycleStarted

then do;

if Reason in(1, 3)

then do;

CycleStart = Reason;

CycleStarted = 1;

end;

else abort; /* Cycle doesn't start with 1 or 5 */

end;

else do;

/* We are in a cycle */

if Reason in(1, 3) | last.PersonID

then do;

CycleEnd = Reason;

output;

CycleStarted = 0;

CycleCount = 0;

CycleStart = .;

CycleEnd = .;

end;

else if reason in(4, 5) then CycleCount = CycleCount + 1;

else;

end;

run;

Super User
Posts: 9,682

Re: Flagging the start of a pattern across observations within a group

I like this kind of question.

data have;
infile cards expandtabs truncover;
input LocationID	PersonID	Date : $20.	Reason	;
cards;
1	1	1/2/2014	01 
1	1	1/3/2014	04	 	 	 
1	1	1/4/2014	05	 	 	 
1	1	1/5/2014	04	 	 	 
1	1	1/6/2014	01		 	 
1	1	1/7/2014	03	 		
1	1	1/8/2014	04	 	 	 
1	1	1/9/2014	05	 	 	 
1	1	1/10/2014	04	 	 	 
1	1	1/11/2014	01		 	 
1	1	1/12/2014	09	 	 	 
1	2	1/13/2014	01	 	 	 
1	2	1/14/2014	09	 	 	 
1	2	1/15/2014	01	 		
1	2	1/16/2014	04	 	 	 
1	2	1/17/2014	04	 	 	 
1	2	1/18/2014	01		 	 
1	2	1/19/2014	01	 		
1	2	1/20/2014	04	 	 	 
1	2	1/21/2014	05	 	 	 
1	2	1/22/2014	03		 	 
2	3	1/23/2014	01	 	 	 
2	3	1/24/2014	04	 	 	 
2	3	1/25/2014	04	 	 	 
2	3	1/26/2014	09	 	 	 
2	3	1/27/2014	01	 	
2	3	1/28/2014	04	 	 	 
2	3	1/29/2014	05	 	 	 
2	3	1/30/2014	03		 	 
2	4	1/31/2014	01	 	 	 
2	4	2/1/2014	04	 	 	 
2	4	2/2/2014	04	 	 	 
3	5	2/3/2014	04	 	 	 
3	5	2/4/2014	04	 	 	 
3	5	2/5/2014	01	 
3	5	2/6/2014	04	 	 	 
3	5	2/7/2014	04	 	 	 
3	5	2/8/2014	04	 	 	 
3	5	2/9/2014	03		 	 
3	6	2/10/2014	06	 	 	 
3	6	2/11/2014	08
;
run;

data temp;
 set have;
 by PersonID;
 if first.PersonID	then idx=0;
 idx+1;
 if Reason in (1 3) then New_Reason=9999999;
  else if Reason in (4 5) then New_Reason=8888888;
   else New_Reason=Reason;
run;
data want;
array x{9999999} _temporary_;
call missing(of x{*});

k=0;
do until(last.PersonID);
 set temp;
 by PersonID;
 k+1;
 x{k}=New_Reason;
end;

do until(last.PersonID);
 set temp;
 by PersonID;
 if New_Reason=9999999 then do;
  n=0;
  do i=idx+1 to dim(x);
   n+1;
   if x{i} ne 8888888 then leave;
  end;
  
  if n ne 1 and x{i}=9999999 then do;
   Cycle_start=1;
   Count=n-1;
   _idx=i;
  end;
 end;
 
 if idx=_idx then Cycle_end=1;
 output;
 call missing(Cycle_start,Cycle_end,Count) ;
end;

drop idx _idx n k i New_Reason;
run;
 









Occasional Contributor
Posts: 7

Re: Flagging the start of a pattern across observations within a group

Thank you!! This code does work on my mini dataset but I know that my larger dataset has patterns that the small dataset doesn't capture. I am going to run this code on the large dataset (@20 million obs) tomorrow morning and I will update then. 

Super User
Posts: 9,682

Re: Flagging the start of a pattern across observations within a group

OK.This code might give you a little fast .

 

 


data have;
infile cards expandtabs truncover;
input LocationID	PersonID	Date : $20.	Reason	;
cards;
1	1	1/2/2014	01 
1	1	1/3/2014	04	 	 	 
1	1	1/4/2014	05	 	 	 
1	1	1/5/2014	04	 	 	 
1	1	1/6/2014	01		 	 
1	1	1/7/2014	03	 		
1	1	1/8/2014	04	 	 	 
1	1	1/9/2014	05	 	 	 
1	1	1/10/2014	04	 	 	 
1	1	1/11/2014	01		 	 
1	1	1/12/2014	09	 	 	 
1	2	1/13/2014	01	 	 	 
1	2	1/14/2014	09	 	 	 
1	2	1/15/2014	01	 		
1	2	1/16/2014	04	 	 	 
1	2	1/17/2014	04	 	 	 
1	2	1/18/2014	01		 	 
1	2	1/19/2014	01	 		
1	2	1/20/2014	04	 	 	 
1	2	1/21/2014	05	 	 	 
1	2	1/22/2014	03		 	 
2	3	1/23/2014	01	 	 	 
2	3	1/24/2014	04	 	 	 
2	3	1/25/2014	04	 	 	 
2	3	1/26/2014	09	 	 	 
2	3	1/27/2014	01	 	
2	3	1/28/2014	04	 	 	 
2	3	1/29/2014	05	 	 	 
2	3	1/30/2014	03		 	 
2	4	1/31/2014	01	 	 	 
2	4	2/1/2014	04	 	 	 
2	4	2/2/2014	04	 	 	 
3	5	2/3/2014	04	 	 	 
3	5	2/4/2014	04	 	 	 
3	5	2/5/2014	01	 
3	5	2/6/2014	04	 	 	 
3	5	2/7/2014	04	 	 	 
3	5	2/8/2014	04	 	 	 
3	5	2/9/2014	03		 	 
3	6	2/10/2014	06	 	 	 
3	6	2/11/2014	08
;
run;

data temp;
 set have;
 by PersonID;
 if first.PersonID	then idx=0;
 idx+1;
run;
data want;
array x{999999} _temporary_;
call missing(of x{*});

k=0;
do until(last.PersonID);
 set temp;
 by PersonID;
 k+1;
 x{k}=Reason;
end;

do until(last.PersonID);
 set temp;
 by PersonID;
 if Reason in (1 3) then do;
  n=0;
  do i=idx+1 to k;
   n+1;
   if x{i} not in (4 5) then leave;
  end;
  
  if n ne 1 and  x{i} in (1 3) then do;
   Cycle_start=1;
   Count=n-1;
   _idx=i;
  end;
 end;
 
 if idx=_idx then Cycle_end=1;
 output;
 call missing(Cycle_start,Cycle_end,Count) ;
end;

drop idx _idx n k i ;
run;
 
Occasional Contributor
Posts: 7

Re: Flagging the start of a pattern across observations within a group

This version is very close to working! There is one pattern that is not being accounted for with this code. When a person has multiple 01/03 it is assigning two cycle starts and one end after the second cycle start instead of one start and one end. I pasted an example of what I am referring to below. The current cycle start/end is what the code is doing and the correct cycle start/end is what should be captured. The end of one cycle may not be the begining of another cycle. Any ideas on how to modify the code to address these situations? Also any other ideas to make it faster would be great, it did take a while to run.

Thanks again!!

LocationIDPersonIDDateReasonCurrent cycle startCurrent cycle endCorrect cycle startCorrect cycle end
111/11/201401    
111/12/201409    
111/13/201401    
111/14/201406    
111/15/2014031 1 
111/16/201404    
111/17/2014031  1
111/18/201404    
111/19/201404    
111/20/201403 1  
Solution
‎05-27-2016 10:32 AM
Super User
Posts: 9,682

Re: Flagging the start of a pattern across observations within a group

Sorry so late. I don't know why I can't see your message (your message would not be able to highlight).
OK. Here is.




data have;
infile cards expandtabs truncover;
input LocationID	PersonID	Date : $20.	Reason	;
cards;
1	1	1/11/2014	01	 	 	 	 
1	1	1/12/2014	09	 	 	 	 
1	1	1/13/2014	01	 	 	 	 
1	1	1/14/2014	06	 	 	 	 
1	1	1/15/2014	03 
1	1	1/16/2014	04	 	 	 	 
1	1	1/17/2014	03
1	1	1/18/2014	04	 	 	 	 
1	1	1/19/2014	04	 	 	 	 
1	1	1/20/2014	03
1	1	1/2/2014	01 
1	1	1/3/2014	04	 	 	 
1	1	1/4/2014	05	 	 	 
1	1	1/5/2014	04	 	 	 
1	1	1/6/2014	01		 	 
1	1	1/7/2014	03	 		
1	1	1/8/2014	04	 	 	 
1	1	1/9/2014	05	 	 	 
1	1	1/10/2014	04	 	 	 
1	1	1/11/2014	01		 	 
1	1	1/12/2014	09	 	 	 
1	2	1/13/2014	01	 	 	 
1	2	1/14/2014	09	 	 	 
1	2	1/15/2014	01	 		
1	2	1/16/2014	04	 	 	 
1	2	1/17/2014	04	 	 	 
1	2	1/18/2014	01		 	 
1	2	1/19/2014	01	 		
1	2	1/20/2014	04	 	 	 
1	2	1/21/2014	05	 	 	 
1	2	1/22/2014	03		 	 
2	3	1/23/2014	01	 	 	 
2	3	1/24/2014	04	 	 	 
2	3	1/25/2014	04	 	 	 
2	3	1/26/2014	09	 	 	 
2	3	1/27/2014	01	 	
2	3	1/28/2014	04	 	 	 
2	3	1/29/2014	05	 	 	 
2	3	1/30/2014	03		 	 
2	4	1/31/2014	01	 	 	 
2	4	2/1/2014	04	 	 	 
2	4	2/2/2014	04	 	 	 
3	5	2/3/2014	04	 	 	 
3	5	2/4/2014	04	 	 	 
3	5	2/5/2014	01	 
3	5	2/6/2014	04	 	 	 
3	5	2/7/2014	04	 	 	 
3	5	2/8/2014	04	 	 	 
3	5	2/9/2014	03		 	 
3	6	2/10/2014	06	 	 	 
3	6	2/11/2014	08
;
run;


data want;
array x{999999} _temporary_;
call missing(of x{*});

do k=1 by 1 until(last.PersonID);
 set have;
 by PersonID;
 x{k}=Reason;
end;

do j=1 by 1 until(last.PersonID);
 set have;
 by PersonID;
 if Reason in (1 3) and j ne _idx then do;
  n=0;
  do i=j+1 to k;
   n+1;
   if x{i} not in (4 5) then leave;
  end;
  
  if n ne 1 and  x{i} in (1 3) then do;
   Cycle_start=1;
   Count=n-1;
   _idx=i;
  end;
 end;
 
 if j=_idx then Cycle_end=1;
 output;
 call missing(Cycle_start,Cycle_end,Count) ;
end;

drop j _idx n k i ;
run;


Occasional Contributor
Posts: 7

Re: Flagging the start of a pattern across observations within a group

Thank you, this worked perfectly and took just under 4 hours to run! 

Respected Advisor
Posts: 4,654

Re: Flagging the start of a pattern across observations within a group

Here is my try:

 


data temp;
set have; by LocationID PersonID;
if first.PersonId then personCycle=0;
if reason in (1, 3) then do;
    personCycle + 1;
    if mod(personCycle, 2) = 1 then cycleStart = 1;
    else cycleEnd = 1;
    end;
cycle45Count + reason in (4, 5);
run;

proc sql;
create table want as
select LocationID, PersonID, Date, Reason, cycleStart, cycleEnd,
    case when cycleStart and range(cycle45count)>0 
        then range(cycle45count) 
        else . end as cycleCount     
from temp
group by LocationID, PersonID, personCycle
order by LocationID, PersonID, date;
select * from want;
quit;
PG
Respected Advisor
Posts: 4,654

Re: Flagging the start of a pattern across observations within a group

Here is a more efficient version for larger datasets:

 

data temp;
set have; by LocationID PersonID;
if first.PersonId then personCycle=0;
if reason in (1, 3) then do;
    personCycle + 1;
    if mod(personCycle, 2) = 1 then cycleStart = 1;
    else cycleEnd = 1;
    end;
cycle45Count + reason in (4, 5);
run;

data want;
do until(last.personCycle);
    set temp; by LocationID PersonID personCycle;
    if last.personCycle then countEnd = cycle45count;
    end;
do until(last.personCycle);
    set temp; by LocationID PersonID personCycle;
    if cycleStart and cycle45count < countEnd 
        then cycleCount = countEnd - cycle45count;
    output;
    call missing(cycleCount);
    end;
drop personCycle cycle45Count countEnd;
run;
 
proc print data=want noobs; run;
PG
Occasional Contributor
Posts: 7

Re: Flagging the start of a pattern across observations within a group

Thanks for the try! This code is also identifying cycles that have other reasons in between (like 1 followed by a 9 or a 3 followed by a 1) and it is not requiring a cycle end before identifying another start.

Respected Advisor
Posts: 4,654

Re: Flagging the start of a pattern across observations within a group

I assumed that a change in person_id would signal the end of information for the current cycle. For example, when going from person 2 to 3  on 23JAN2014, the cycle started for person 2 on 18JAN2014 hasn't been closed. Do you want to assume that the same cycle started under person 2 continues for person 3?

PG
Occasional Contributor
Posts: 7

Re: Flagging the start of a pattern across observations within a group

No, but a cycle start shouldn't be indicated if that person doesn't also have an end within this data. So if they didn't have a 1 or 3 also followed by a 4 or 5 also followed by a 1 or 3 then for this purpose they should not have a cycle start. Hopefully that makes more sense?

Respected Advisor
Posts: 4,654

Re: Flagging the start of a pattern across observations within a group

So why is 23JAN2014 for person 3 not the start of a cycle ending on 27JAN2014 with a count of 2?

PG
Occasional Contributor
Posts: 7

Re: Flagging the start of a pattern across observations within a group

That is not the start of a cycle because the 26JAN2014 for person 3 has a reason of 9. It should only count as a cycle if it started with 1/3, had one or more 4/5's, and ended in a 1/3 with no other reason types within the cycle. Anytime there is a different reason type the prior 1/3 and 4/5's are no longer eligible to be part of a cycle.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 1420 views
  • 0 likes
  • 4 in conversation