BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
caaich
Calcite | Level 5

Hi everyone,

 

I am hoping someone can help me! I am working with data that have multiple observations per person, each with a different reason for the observation. The pattern must start with a reason of 1 or 3, have one or more observations with a 4 or 5, and end with an observation of a 1 or 3. The number of observations within a pattern can vary and this pattern may occur multiple times for one person. I have the end of the pattern I am looking for flagged but I now need to flag the start of the pattern I am looking for as well as populate that observation with the number of 4/5's that follow prior to the pattern ending 1 or 3 observation. I have put together some fake numbers in an effort to help explain what I am looking for. I am using SAS 9.4.

 

LocationIDPersonIDDateReasonCycle_end(have)Cycle_start(need)Count(need)
111/2/201401 13
111/3/201404   
111/4/201405   
111/5/201404   
111/6/2014011  
111/7/201403 13
111/8/201404   
111/9/201405   
111/10/201404   
111/11/2014011  
111/12/201409   
121/13/201401   
121/14/201409   
121/15/201401 12
121/16/201404   
121/17/201404   
121/18/2014011  
121/19/201401 12
121/20/201404   
121/21/201405   
121/22/2014031  
231/23/201401   
231/24/201404   
231/25/201404   
231/26/201409   
231/27/201401 12
231/28/201404   
231/29/201405   
231/30/2014031  
241/31/201401   
242/1/201404   
242/2/201404   
352/3/201404   
352/4/201404   
352/5/201401 13
352/6/201404   
352/7/201404   
352/8/201404   
352/9/2014031  
362/10/201406   
362/11/201408   


Thanks for any help!! 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Sorry so late. I don't know why I can't see your message (your message would not be able to highlight).
OK. Here is.




data have;
infile cards expandtabs truncover;
input LocationID	PersonID	Date : $20.	Reason	;
cards;
1	1	1/11/2014	01	 	 	 	 
1	1	1/12/2014	09	 	 	 	 
1	1	1/13/2014	01	 	 	 	 
1	1	1/14/2014	06	 	 	 	 
1	1	1/15/2014	03 
1	1	1/16/2014	04	 	 	 	 
1	1	1/17/2014	03
1	1	1/18/2014	04	 	 	 	 
1	1	1/19/2014	04	 	 	 	 
1	1	1/20/2014	03
1	1	1/2/2014	01 
1	1	1/3/2014	04	 	 	 
1	1	1/4/2014	05	 	 	 
1	1	1/5/2014	04	 	 	 
1	1	1/6/2014	01		 	 
1	1	1/7/2014	03	 		
1	1	1/8/2014	04	 	 	 
1	1	1/9/2014	05	 	 	 
1	1	1/10/2014	04	 	 	 
1	1	1/11/2014	01		 	 
1	1	1/12/2014	09	 	 	 
1	2	1/13/2014	01	 	 	 
1	2	1/14/2014	09	 	 	 
1	2	1/15/2014	01	 		
1	2	1/16/2014	04	 	 	 
1	2	1/17/2014	04	 	 	 
1	2	1/18/2014	01		 	 
1	2	1/19/2014	01	 		
1	2	1/20/2014	04	 	 	 
1	2	1/21/2014	05	 	 	 
1	2	1/22/2014	03		 	 
2	3	1/23/2014	01	 	 	 
2	3	1/24/2014	04	 	 	 
2	3	1/25/2014	04	 	 	 
2	3	1/26/2014	09	 	 	 
2	3	1/27/2014	01	 	
2	3	1/28/2014	04	 	 	 
2	3	1/29/2014	05	 	 	 
2	3	1/30/2014	03		 	 
2	4	1/31/2014	01	 	 	 
2	4	2/1/2014	04	 	 	 
2	4	2/2/2014	04	 	 	 
3	5	2/3/2014	04	 	 	 
3	5	2/4/2014	04	 	 	 
3	5	2/5/2014	01	 
3	5	2/6/2014	04	 	 	 
3	5	2/7/2014	04	 	 	 
3	5	2/8/2014	04	 	 	 
3	5	2/9/2014	03		 	 
3	6	2/10/2014	06	 	 	 
3	6	2/11/2014	08
;
run;


data want;
array x{999999} _temporary_;
call missing(of x{*});

do k=1 by 1 until(last.PersonID);
 set have;
 by PersonID;
 x{k}=Reason;
end;

do j=1 by 1 until(last.PersonID);
 set have;
 by PersonID;
 if Reason in (1 3) and j ne _idx then do;
  n=0;
  do i=j+1 to k;
   n+1;
   if x{i} not in (4 5) then leave;
  end;
  
  if n ne 1 and  x{i} in (1 3) then do;
   Cycle_start=1;
   Count=n-1;
   _idx=i;
  end;
 end;
 
 if j=_idx then Cycle_end=1;
 output;
 call missing(Cycle_start,Cycle_end,Count) ;
end;

drop j _idx n k i ;
run;


View solution in original post

14 REPLIES 14
TomKari
Onyx | Level 15

Here's as far as I've made it. Note that it is abending due to a program statement, where your data doesn't seem to match the pattern.

 

Tom

 

proc sort data=Have;

by PersonID Date;

run;

data Cycles;

set Have;

by PersonID;

retain CycleStarted CycleCount CycleStart CycleEnd;

if first.PersonID

then do;

CycleStarted = 0;

CycleCount = 0;

CycleStart = .;

CycleEnd = .;

end;

if Reason = 9 then /* Assume we ignore 9s? */

return;

if ^CycleStarted

then do;

if Reason in(1, 3)

then do;

CycleStart = Reason;

CycleStarted = 1;

end;

else abort; /* Cycle doesn't start with 1 or 5 */

end;

else do;

/* We are in a cycle */

if Reason in(1, 3) | last.PersonID

then do;

CycleEnd = Reason;

output;

CycleStarted = 0;

CycleCount = 0;

CycleStart = .;

CycleEnd = .;

end;

else if reason in(4, 5) then CycleCount = CycleCount + 1;

else;

end;

run;

Ksharp
Super User
I like this kind of question.

data have;
infile cards expandtabs truncover;
input LocationID	PersonID	Date : $20.	Reason	;
cards;
1	1	1/2/2014	01 
1	1	1/3/2014	04	 	 	 
1	1	1/4/2014	05	 	 	 
1	1	1/5/2014	04	 	 	 
1	1	1/6/2014	01		 	 
1	1	1/7/2014	03	 		
1	1	1/8/2014	04	 	 	 
1	1	1/9/2014	05	 	 	 
1	1	1/10/2014	04	 	 	 
1	1	1/11/2014	01		 	 
1	1	1/12/2014	09	 	 	 
1	2	1/13/2014	01	 	 	 
1	2	1/14/2014	09	 	 	 
1	2	1/15/2014	01	 		
1	2	1/16/2014	04	 	 	 
1	2	1/17/2014	04	 	 	 
1	2	1/18/2014	01		 	 
1	2	1/19/2014	01	 		
1	2	1/20/2014	04	 	 	 
1	2	1/21/2014	05	 	 	 
1	2	1/22/2014	03		 	 
2	3	1/23/2014	01	 	 	 
2	3	1/24/2014	04	 	 	 
2	3	1/25/2014	04	 	 	 
2	3	1/26/2014	09	 	 	 
2	3	1/27/2014	01	 	
2	3	1/28/2014	04	 	 	 
2	3	1/29/2014	05	 	 	 
2	3	1/30/2014	03		 	 
2	4	1/31/2014	01	 	 	 
2	4	2/1/2014	04	 	 	 
2	4	2/2/2014	04	 	 	 
3	5	2/3/2014	04	 	 	 
3	5	2/4/2014	04	 	 	 
3	5	2/5/2014	01	 
3	5	2/6/2014	04	 	 	 
3	5	2/7/2014	04	 	 	 
3	5	2/8/2014	04	 	 	 
3	5	2/9/2014	03		 	 
3	6	2/10/2014	06	 	 	 
3	6	2/11/2014	08
;
run;

data temp;
 set have;
 by PersonID;
 if first.PersonID	then idx=0;
 idx+1;
 if Reason in (1 3) then New_Reason=9999999;
  else if Reason in (4 5) then New_Reason=8888888;
   else New_Reason=Reason;
run;
data want;
array x{9999999} _temporary_;
call missing(of x{*});

k=0;
do until(last.PersonID);
 set temp;
 by PersonID;
 k+1;
 x{k}=New_Reason;
end;

do until(last.PersonID);
 set temp;
 by PersonID;
 if New_Reason=9999999 then do;
  n=0;
  do i=idx+1 to dim(x);
   n+1;
   if x{i} ne 8888888 then leave;
  end;
  
  if n ne 1 and x{i}=9999999 then do;
   Cycle_start=1;
   Count=n-1;
   _idx=i;
  end;
 end;
 
 if idx=_idx then Cycle_end=1;
 output;
 call missing(Cycle_start,Cycle_end,Count) ;
end;

drop idx _idx n k i New_Reason;
run;
 









caaich
Calcite | Level 5

Thank you!! This code does work on my mini dataset but I know that my larger dataset has patterns that the small dataset doesn't capture. I am going to run this code on the large dataset (@20 million obs) tomorrow morning and I will update then. 

Ksharp
Super User

OK.This code might give you a little fast .

 

 


data have;
infile cards expandtabs truncover;
input LocationID	PersonID	Date : $20.	Reason	;
cards;
1	1	1/2/2014	01 
1	1	1/3/2014	04	 	 	 
1	1	1/4/2014	05	 	 	 
1	1	1/5/2014	04	 	 	 
1	1	1/6/2014	01		 	 
1	1	1/7/2014	03	 		
1	1	1/8/2014	04	 	 	 
1	1	1/9/2014	05	 	 	 
1	1	1/10/2014	04	 	 	 
1	1	1/11/2014	01		 	 
1	1	1/12/2014	09	 	 	 
1	2	1/13/2014	01	 	 	 
1	2	1/14/2014	09	 	 	 
1	2	1/15/2014	01	 		
1	2	1/16/2014	04	 	 	 
1	2	1/17/2014	04	 	 	 
1	2	1/18/2014	01		 	 
1	2	1/19/2014	01	 		
1	2	1/20/2014	04	 	 	 
1	2	1/21/2014	05	 	 	 
1	2	1/22/2014	03		 	 
2	3	1/23/2014	01	 	 	 
2	3	1/24/2014	04	 	 	 
2	3	1/25/2014	04	 	 	 
2	3	1/26/2014	09	 	 	 
2	3	1/27/2014	01	 	
2	3	1/28/2014	04	 	 	 
2	3	1/29/2014	05	 	 	 
2	3	1/30/2014	03		 	 
2	4	1/31/2014	01	 	 	 
2	4	2/1/2014	04	 	 	 
2	4	2/2/2014	04	 	 	 
3	5	2/3/2014	04	 	 	 
3	5	2/4/2014	04	 	 	 
3	5	2/5/2014	01	 
3	5	2/6/2014	04	 	 	 
3	5	2/7/2014	04	 	 	 
3	5	2/8/2014	04	 	 	 
3	5	2/9/2014	03		 	 
3	6	2/10/2014	06	 	 	 
3	6	2/11/2014	08
;
run;

data temp;
 set have;
 by PersonID;
 if first.PersonID	then idx=0;
 idx+1;
run;
data want;
array x{999999} _temporary_;
call missing(of x{*});

k=0;
do until(last.PersonID);
 set temp;
 by PersonID;
 k+1;
 x{k}=Reason;
end;

do until(last.PersonID);
 set temp;
 by PersonID;
 if Reason in (1 3) then do;
  n=0;
  do i=idx+1 to k;
   n+1;
   if x{i} not in (4 5) then leave;
  end;
  
  if n ne 1 and  x{i} in (1 3) then do;
   Cycle_start=1;
   Count=n-1;
   _idx=i;
  end;
 end;
 
 if idx=_idx then Cycle_end=1;
 output;
 call missing(Cycle_start,Cycle_end,Count) ;
end;

drop idx _idx n k i ;
run;
 
caaich
Calcite | Level 5

This version is very close to working! There is one pattern that is not being accounted for with this code. When a person has multiple 01/03 it is assigning two cycle starts and one end after the second cycle start instead of one start and one end. I pasted an example of what I am referring to below. The current cycle start/end is what the code is doing and the correct cycle start/end is what should be captured. The end of one cycle may not be the begining of another cycle. Any ideas on how to modify the code to address these situations? Also any other ideas to make it faster would be great, it did take a while to run.

Thanks again!!

LocationIDPersonIDDateReasonCurrent cycle startCurrent cycle endCorrect cycle startCorrect cycle end
111/11/201401    
111/12/201409    
111/13/201401    
111/14/201406    
111/15/2014031 1 
111/16/201404    
111/17/2014031  1
111/18/201404    
111/19/201404    
111/20/201403 1  
Ksharp
Super User
Sorry so late. I don't know why I can't see your message (your message would not be able to highlight).
OK. Here is.




data have;
infile cards expandtabs truncover;
input LocationID	PersonID	Date : $20.	Reason	;
cards;
1	1	1/11/2014	01	 	 	 	 
1	1	1/12/2014	09	 	 	 	 
1	1	1/13/2014	01	 	 	 	 
1	1	1/14/2014	06	 	 	 	 
1	1	1/15/2014	03 
1	1	1/16/2014	04	 	 	 	 
1	1	1/17/2014	03
1	1	1/18/2014	04	 	 	 	 
1	1	1/19/2014	04	 	 	 	 
1	1	1/20/2014	03
1	1	1/2/2014	01 
1	1	1/3/2014	04	 	 	 
1	1	1/4/2014	05	 	 	 
1	1	1/5/2014	04	 	 	 
1	1	1/6/2014	01		 	 
1	1	1/7/2014	03	 		
1	1	1/8/2014	04	 	 	 
1	1	1/9/2014	05	 	 	 
1	1	1/10/2014	04	 	 	 
1	1	1/11/2014	01		 	 
1	1	1/12/2014	09	 	 	 
1	2	1/13/2014	01	 	 	 
1	2	1/14/2014	09	 	 	 
1	2	1/15/2014	01	 		
1	2	1/16/2014	04	 	 	 
1	2	1/17/2014	04	 	 	 
1	2	1/18/2014	01		 	 
1	2	1/19/2014	01	 		
1	2	1/20/2014	04	 	 	 
1	2	1/21/2014	05	 	 	 
1	2	1/22/2014	03		 	 
2	3	1/23/2014	01	 	 	 
2	3	1/24/2014	04	 	 	 
2	3	1/25/2014	04	 	 	 
2	3	1/26/2014	09	 	 	 
2	3	1/27/2014	01	 	
2	3	1/28/2014	04	 	 	 
2	3	1/29/2014	05	 	 	 
2	3	1/30/2014	03		 	 
2	4	1/31/2014	01	 	 	 
2	4	2/1/2014	04	 	 	 
2	4	2/2/2014	04	 	 	 
3	5	2/3/2014	04	 	 	 
3	5	2/4/2014	04	 	 	 
3	5	2/5/2014	01	 
3	5	2/6/2014	04	 	 	 
3	5	2/7/2014	04	 	 	 
3	5	2/8/2014	04	 	 	 
3	5	2/9/2014	03		 	 
3	6	2/10/2014	06	 	 	 
3	6	2/11/2014	08
;
run;


data want;
array x{999999} _temporary_;
call missing(of x{*});

do k=1 by 1 until(last.PersonID);
 set have;
 by PersonID;
 x{k}=Reason;
end;

do j=1 by 1 until(last.PersonID);
 set have;
 by PersonID;
 if Reason in (1 3) and j ne _idx then do;
  n=0;
  do i=j+1 to k;
   n+1;
   if x{i} not in (4 5) then leave;
  end;
  
  if n ne 1 and  x{i} in (1 3) then do;
   Cycle_start=1;
   Count=n-1;
   _idx=i;
  end;
 end;
 
 if j=_idx then Cycle_end=1;
 output;
 call missing(Cycle_start,Cycle_end,Count) ;
end;

drop j _idx n k i ;
run;


caaich
Calcite | Level 5

Thank you, this worked perfectly and took just under 4 hours to run! 

PGStats
Opal | Level 21

Here is my try:

 


data temp;
set have; by LocationID PersonID;
if first.PersonId then personCycle=0;
if reason in (1, 3) then do;
    personCycle + 1;
    if mod(personCycle, 2) = 1 then cycleStart = 1;
    else cycleEnd = 1;
    end;
cycle45Count + reason in (4, 5);
run;

proc sql;
create table want as
select LocationID, PersonID, Date, Reason, cycleStart, cycleEnd,
    case when cycleStart and range(cycle45count)>0 
        then range(cycle45count) 
        else . end as cycleCount     
from temp
group by LocationID, PersonID, personCycle
order by LocationID, PersonID, date;
select * from want;
quit;
PG
PGStats
Opal | Level 21

Here is a more efficient version for larger datasets:

 

data temp;
set have; by LocationID PersonID;
if first.PersonId then personCycle=0;
if reason in (1, 3) then do;
    personCycle + 1;
    if mod(personCycle, 2) = 1 then cycleStart = 1;
    else cycleEnd = 1;
    end;
cycle45Count + reason in (4, 5);
run;

data want;
do until(last.personCycle);
    set temp; by LocationID PersonID personCycle;
    if last.personCycle then countEnd = cycle45count;
    end;
do until(last.personCycle);
    set temp; by LocationID PersonID personCycle;
    if cycleStart and cycle45count < countEnd 
        then cycleCount = countEnd - cycle45count;
    output;
    call missing(cycleCount);
    end;
drop personCycle cycle45Count countEnd;
run;
 
proc print data=want noobs; run;
PG
caaich
Calcite | Level 5

Thanks for the try! This code is also identifying cycles that have other reasons in between (like 1 followed by a 9 or a 3 followed by a 1) and it is not requiring a cycle end before identifying another start.

PGStats
Opal | Level 21

I assumed that a change in person_id would signal the end of information for the current cycle. For example, when going from person 2 to 3  on 23JAN2014, the cycle started for person 2 on 18JAN2014 hasn't been closed. Do you want to assume that the same cycle started under person 2 continues for person 3?

PG
caaich
Calcite | Level 5

No, but a cycle start shouldn't be indicated if that person doesn't also have an end within this data. So if they didn't have a 1 or 3 also followed by a 4 or 5 also followed by a 1 or 3 then for this purpose they should not have a cycle start. Hopefully that makes more sense?

PGStats
Opal | Level 21

So why is 23JAN2014 for person 3 not the start of a cycle ending on 27JAN2014 with a count of 2?

PG
caaich
Calcite | Level 5

That is not the start of a cycle because the 26JAN2014 for person 3 has a reason of 9. It should only count as a cycle if it started with 1/3, had one or more 4/5's, and ended in a 1/3 with no other reason types within the cycle. Anytime there is a different reason type the prior 1/3 and 4/5's are no longer eligible to be part of a cycle.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 2391 views
  • 0 likes
  • 4 in conversation