BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
feliperczx
Calcite | Level 5

Hi guys!

So, basically what I want to do is to identify certain patterns (cycles actually) of users using a machine.

 

I need to use a IF-THEN statement between two specific transactions. 

The code needs to run from one Login to the next one and use the code of transactions between to identify the cycles.

 

TIME         CODE       TRANSACTION        

7:00:30       801          Login

7:00:35       405          Transaction X

7:00:40       404          Transaction Y

7:02:30      801           Login  

7:02:35     404           Transaction Y

7:02:40     404           Transaction Y

 

The output should be: 1 X Cycle and 1 Y Cycle.

Also, Transaction X is stronger than Transaction Y, then, if X occurs, the cycle is identified as a ''X Cycle''

If not, we have a ''Y Cycle''.

 

Any ideas? Thank you so much!

1 ACCEPTED SOLUTION

Accepted Solutions
hashman
Ammonite | Level 13

@feliperczx :

The peculiarity of your task basically lies in the fact that you need to flag each "end-of-group" record whose end-of-group marker (Login) is stored in the next record. It means that some look-ahead method has to be deployed, which is why the fine solutions by @seemiyah and @andreas_lds require extra steps and rather involved logic. Methinks it's simpler to address the look-ahead need by using the shifted (offset) merge tactic, for example:

data have ;                                                              
   input time: hhmmss. code: $3. trans: $1. ;                            
   format Time time8. ;                                                  
cards ;                                                                  
7:00:30 801 L                                                            
7:00:35 405 X                                                            
7:00:40 404 Y                                                            
7:02:30 801 L                                                            
7:02:35 404 Y                                                            
7:02:40 404 Y                                                            
7:03:60 801 L                                                            
7:04:15 405 X                                                            
7:04:20 404 Y                                                            
7:05:01 405 X                                                            
7:06:03 409 X                                                            
run ;                                                                    
                                                                         
data want (drop = _:) ;                                                  
  do until (_break) ;                                                    
    merge have have (firstobs=2 keep=trans rename=(trans=_t)) end = lr ;
    if trans = "X" then _X = 1 ;                                         
    _break = (_t = "L" or lr) ;                                         
    if _break then cycle = put (ifC (_X, "X", "Y"), $1.) ;               
    output ;                                                             
  end ;                                                                  
run ;                                                                    

Kind regards

Paul D.

    

View solution in original post

10 REPLIES 10
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

Why not use login auditing tools to deterring the amount of time the user is logged in?

take the logged in time and create spans until the next login.

 

feliperczx
Calcite | Level 5

The amount of time the user is logged in it is not important here. 

The LOGIN transaction is just a flag to let me know that another cycle began/ended

 

What I really need to do is to identify the transactions between those LOGIN transactions

 


not sure If i get your suggestion...could you give me an example of what you thought?

 

 

seemiyah
Fluorite | Level 6

Does the following help? It assumes that 'Login' or end-of-file mark the end of a cycle.

 

data have ;
   input Time : hhmmss. Code : $3. Transaction & $20. ;
   format Time time8. ;
cards ;
7:00:30 801 Login
7:00:35 405 Transaction X
7:00:40 404 Transaction Y
7:02:30 801 Login  
7:02:35 404 Transaction Y
7:02:40 404 Transaction Y
;
run ;

proc sort data=have;
   by time ;
run ;

data want ;
   set have end=eof;
   retain _XFound 
             _YFound ;

   if _n_>1 and (Transaction='Login' or eof) then do ;  
       if        _XFound then XCycleCount+1 ;   
       else if _YFound then YCycleCount+1 ;

       _XFound=0 ;
       _YFound=0 ;
   end ;
   
   if        Transaction='Transaction X' then _XFound=1 ;      
   else if Transaction='Transaction Y' then _YFound=1 ;      

   if eof then output ;

   keep XCycleCount YCycleCount ;
run ;
andreas_lds
Jade | Level 19

Do you need the information in the log or in a dataset? If you need a dataset, please post some details on the expected structure.

Is the variable "code" numeric or alphanumeric? The value of "code" seems to be related to the value of "transaction", is this the case or just a coincidence?

 

Here is one possible solution:

data have;
   attrib 
      time length=8 format=time8. informat=time.
      code length=8
      transaction length=$30
   ;
   infile datalines dlmstr='  ';
   input time code transaction;

datalines;
7:00:30  801  Login
7:00:35  405  Transaction X
7:00:40  404  Transaction Y
7:02:30  801  Login 
7:02:35  404  Transaction Y
7:02:40  404  Transaction Y
;
run;


data _null_;
   set have end=jobDone;

   length x_cycles y_cycles current_cycle 8;
   retain x_cycles y_cycles 0 current_cycle .;

   array cycles[404:405] y_cycles x_cycles;

   if code = 801 then do;
      if not missing(current_cycle) then do;
         cycles[current_cycle] = cycles[current_cycle] + 1;
         current_cycle = .;
      end;
   end;
   else do;
      current_cycle = max(current_cycle, code);
   end;

   
   if jobDone then do;
      cycles[current_cycle] = cycles[current_cycle] + 1;
      put x_cycles "X cycle and " y_cycles "Y cycle";
   end;
run;
feliperczx
Calcite | Level 5

I need the information on a dataset.

Would be great If I had a new column/variable (CYCLE) and at the end of each cycle, a classification for the prior one. 
For example, expected structure:

 

TIME     CODE TRANSACTION       CYCLE
7:00:30 801 Login 7:00:35 405 Transaction X 7:00:40 404 Transaction Y X Cycle 7:02:30 801 Login 7:02:35 404 Transaction Y 7:02:40 404 Transaction Y     Y Cycle
7:03:60  801  Login
7:04:15  405  Transaction X
7:04:20  404  Transaction Y
7:05:01  405  Transaction X
7:06:03  409  Transaction X     X Cycle

The variable CODE is a character variable. The value of 'CODE' it is indeed related to the value of 'TRANSACTION', but I do that classification in a prior step, for example:

IF CODE = '801' THEN TRANSACTION = 'Login';
ELSE IF CODE IN ('405','409') THEN TRANSACTION = 'Transaction X';

ELSE TRANSACTION = 'Transaction Y';

 

 

seemiyah
Fluorite | Level 6

If you want a dataset like the example you have provided it may require a two pass solution. First two identify the cycle boundaries and then apply the priority rules.

 

Here's a possible solution below. You could possibly eliminate the derivation of the Transaction variable and substitute this logic into the datasteps. eg if Code='801' then CycleCount+1.

 

data have ;
   input Time : hhmmss. Code : $3. Transaction & $20. ;
   format Time time8. ;
cards ;
7:00:30 801 Login
7:00:35 405 Transaction X
7:00:40 404 Transaction Y
7:02:30 801 Login 
7:02:35 404 Transaction Y
7:02:40 404 Transaction Y
7:03:60 801 Login
7:04:15 405 Transaction X
7:04:20 404 Transaction Y
7:05:01 405 Transaction X
7:06:03 409 Transaction X
run ;

proc sort data=have ;
   by Time ;
run ;

data have_cycle ;
   set have ;
   if Transaction='Login' then CycleCount+1 ;
run ;

data want ;
   set have_cycle;
   by CycleCount ;

   retain _XFound _YFound ;
   length Cycle $8 ;

   if first.CycleCount then do ;
       _XFound=0 ;
       _YFound=0 ;
   end ;

   if       Transaction='Transaction X' then _XFound=1 ;      
   else if  Transaction='Transaction Y' then _YFound=1 ;

  if last.CycleCount then do ;
       if      _XFound then Cycle='X Cycle' ;   
       else if _YFound then Cycle='Y Cycle' ;
   end ;

   drop _: CycleCount ;
run ;

 

 

hashman
Ammonite | Level 13

@feliperczx :

The peculiarity of your task basically lies in the fact that you need to flag each "end-of-group" record whose end-of-group marker (Login) is stored in the next record. It means that some look-ahead method has to be deployed, which is why the fine solutions by @seemiyah and @andreas_lds require extra steps and rather involved logic. Methinks it's simpler to address the look-ahead need by using the shifted (offset) merge tactic, for example:

data have ;                                                              
   input time: hhmmss. code: $3. trans: $1. ;                            
   format Time time8. ;                                                  
cards ;                                                                  
7:00:30 801 L                                                            
7:00:35 405 X                                                            
7:00:40 404 Y                                                            
7:02:30 801 L                                                            
7:02:35 404 Y                                                            
7:02:40 404 Y                                                            
7:03:60 801 L                                                            
7:04:15 405 X                                                            
7:04:20 404 Y                                                            
7:05:01 405 X                                                            
7:06:03 409 X                                                            
run ;                                                                    
                                                                         
data want (drop = _:) ;                                                  
  do until (_break) ;                                                    
    merge have have (firstobs=2 keep=trans rename=(trans=_t)) end = lr ;
    if trans = "X" then _X = 1 ;                                         
    _break = (_t = "L" or lr) ;                                         
    if _break then cycle = put (ifC (_X, "X", "Y"), $1.) ;               
    output ;                                                             
  end ;                                                                  
run ;                                                                    

Kind regards

Paul D.

    

feliperczx
Calcite | Level 5
Perfect, dude! Thank you so much!
feliperczx
Calcite | Level 5
hashman, how would you count the time between Logins? to determine the duration of the cycle in minutes
hashman
Ammonite | Level 13

@feliperczx :

One way would be:

data have ;                                                                            
   input time: hhmmss. code: $3. trans: $1. ;                                          
   format Time time8. ;                                                                
cards ;                                                                                
7:00:30 801 L                                                                          
7:00:35 405 X                                                                          
7:00:40 404 Y                                                                          
7:02:30 801 L                                                                          
7:02:35 404 Y                                                                          
7:02:40 404 Y                                                                          
7:03:60 801 L                                                                          
7:04:15 405 X                                                                          
7:04:20 404 Y                                                                          
7:05:01 405 X                                                                          
7:06:03 409 X                                                                          
run ;                                                                                  
                                                                                       
data want (drop = _:) ;                                                                
  do _q = 1 by 1 until (_break) ;                                                      
    merge have have (firstobs=2 keep=time trans rename=(time=_tm trans=_t)) end = lr ; 
    if trans = "X" then _X = 1 ;                                                       
    if _q = 1 then _tm1 = time ;                                                       
    _break = (_t = "L" or lr) ;                                                        
    if _break then do ;                                                                
      cycle = put (ifC (_X, "X", "Y"), $1.) ;                                          
      if lr then cycle_min = divide (time - _tm1, 60) ;                                
      else       cycle_min = divide (_tm  - _tm1, 60) ;                                
    end ;                                                                              
    output ;                                                                           
  end ;                                                                                
run ;                                                                                  

It assumes that for the last record, you want the difference between the time on this record and the time on the preceding record where trans="L". However, truth be told, I'd rather set cycle_min for the last record to a special missing value (for example, .U) to indicate that since for the last record the beginning of the next cycle is unknown, the cycle time is also undermined. If so, just replace the IF LR condition with:

      if lr then cycle_min = .U ;

Kind regards

Paul D.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1054 views
  • 1 like
  • 5 in conversation