Solved: Re: Data Set / IF-THEN statement between specific lines

feliperczx · Posted 06-24-2019 10:58 PM

Hi guys!

So, basically what I want to do is to identify certain patterns (cycles actually) of users using a machine.

I need to use a IF-THEN statement between two specific transactions.

The code needs to run from one Login to the next one and use the code of transactions between to identify the cycles.

TIME CODE TRANSACTION

7:00:30 801 Login

7:00:35 405 Transaction X

7:00:40 404 Transaction Y

7:02:30 801 Login

7:02:35 404 Transaction Y

7:02:40 404 Transaction Y

The output should be: 1 X Cycle and 1 Y Cycle.

Also, Transaction X is stronger than Transaction Y, then, if X occurs, the cycle is identified as a ''X Cycle''

If not, we have a ''Y Cycle''.

Any ideas? Thank you so much!

hashman · Posted 06-26-2019 12:40 AM

@feliperczx :

The peculiarity of your task basically lies in the fact that you need to flag each "end-of-group" record whose end-of-group marker (Login) is stored in the next record. It means that some look-ahead method has to be deployed, which is why the fine solutions by @seemiyah and @andreas_lds require extra steps and rather involved logic. Methinks it's simpler to address the look-ahead need by using the shifted (offset) merge tactic, for example:

data have ;                                                              
   input time: hhmmss. code: $3. trans: $1. ;                            
   format Time time8. ;                                                  
cards ;                                                                  
7:00:30 801 L                                                            
7:00:35 405 X                                                            
7:00:40 404 Y                                                            
7:02:30 801 L                                                            
7:02:35 404 Y                                                            
7:02:40 404 Y                                                            
7:03:60 801 L                                                            
7:04:15 405 X                                                            
7:04:20 404 Y                                                            
7:05:01 405 X                                                            
7:06:03 409 X                                                            
run ;                                                                    
                                                                         
data want (drop = _:) ;                                                  
  do until (_break) ;                                                    
    merge have have (firstobs=2 keep=trans rename=(trans=_t)) end = lr ;
    if trans = "X" then _X = 1 ;                                         
    _break = (_t = "L" or lr) ;                                         
    if _break then cycle = put (ifC (_X, "X", "Y"), $1.) ;               
    output ;                                                             
  end ;                                                                  
run ;

Kind regards

Paul D.

View solution in original post

VDD · Posted 06-24-2019 11:13 PM

Why not use login auditing tools to deterring the amount of time the user is logged in?

take the logged in time and create spans until the next login.

feliperczx · Posted 06-24-2019 11:23 PM

The amount of time the user is logged in it is not important here.

The LOGIN transaction is just a flag to let me know that another cycle began/ended

What I really need to do is to identify the transactions between those LOGIN transactions

not sure If i get your suggestion...could you give me an example of what you thought?

seemiyah · Posted 06-25-2019 01:23 AM

Does the following help? It assumes that 'Login' or end-of-file mark the end of a cycle.

data have ;
   input Time : hhmmss. Code : $3. Transaction & $20. ;
   format Time time8. ;
cards ;
7:00:30 801 Login
7:00:35 405 Transaction X
7:00:40 404 Transaction Y
7:02:30 801 Login  
7:02:35 404 Transaction Y
7:02:40 404 Transaction Y
;
run ;

proc sort data=have;
   by time ;
run ;

data want ;
   set have end=eof;
   retain _XFound 
             _YFound ;

   if _n_>1 and (Transaction='Login' or eof) then do ;  
       if        _XFound then XCycleCount+1 ;   
       else if _YFound then YCycleCount+1 ;

       _XFound=0 ;
       _YFound=0 ;
   end ;
   
   if        Transaction='Transaction X' then _XFound=1 ;      
   else if Transaction='Transaction Y' then _YFound=1 ;      

   if eof then output ;

   keep XCycleCount YCycleCount ;
run ;

andreas_lds · Posted 06-25-2019 01:25 AM

Do you need the information in the log or in a dataset? If you need a dataset, please post some details on the expected structure.

Is the variable "code" numeric or alphanumeric? The value of "code" seems to be related to the value of "transaction", is this the case or just a coincidence?

Here is one possible solution:

data have;
   attrib 
      time length=8 format=time8. informat=time.
      code length=8
      transaction length=$30
   ;
   infile datalines dlmstr='  ';
   input time code transaction;

datalines;
7:00:30  801  Login
7:00:35  405  Transaction X
7:00:40  404  Transaction Y
7:02:30  801  Login 
7:02:35  404  Transaction Y
7:02:40  404  Transaction Y
;
run;


data _null_;
   set have end=jobDone;

   length x_cycles y_cycles current_cycle 8;
   retain x_cycles y_cycles 0 current_cycle .;

   array cycles[404:405] y_cycles x_cycles;

   if code = 801 then do;
      if not missing(current_cycle) then do;
         cycles[current_cycle] = cycles[current_cycle] + 1;
         current_cycle = .;
      end;
   end;
   else do;
      current_cycle = max(current_cycle, code);
   end;

   
   if jobDone then do;
      cycles[current_cycle] = cycles[current_cycle] + 1;
      put x_cycles "X cycle and " y_cycles "Y cycle";
   end;
run;

feliperczx · Posted 06-25-2019 08:28 AM

I need the information on a dataset.

Would be great If I had a new column/variable (CYCLE) and at the end of each cycle, a classification for the prior one.
For example, expected structure:

TIME     CODE TRANSACTION       CYCLE
7:00:30  801  Login
7:00:35  405  Transaction X
7:00:40  404  Transaction Y     X Cycle
7:02:30  801  Login 
7:02:35  404  Transaction Y
7:02:40  404  Transaction Y     Y Cycle
7:03:60  801  Login
7:04:15  405  Transaction X
7:04:20  404  Transaction Y
7:05:01  405  Transaction X
7:06:03  409  Transaction X     X Cycle

The variable CODE is a character variable. The value of 'CODE' it is indeed related to the value of 'TRANSACTION', but I do that classification in a prior step, for example:

IF CODE = '801' THEN TRANSACTION = 'Login'; ELSE IF CODE IN ('405','409') THEN TRANSACTION = 'Transaction X';

ELSE TRANSACTION = 'Transaction Y';

seemiyah · Posted 06-25-2019 06:40 PM

If you want a dataset like the example you have provided it may require a two pass solution. First two identify the cycle boundaries and then apply the priority rules.

Here's a possible solution below. You could possibly eliminate the derivation of the Transaction variable and substitute this logic into the datasteps. eg if Code='801' then CycleCount+1.

data have ;
   input Time : hhmmss. Code : $3. Transaction & $20. ;
   format Time time8. ;
cards ;
7:00:30 801 Login
7:00:35 405 Transaction X
7:00:40 404 Transaction Y
7:02:30 801 Login 
7:02:35 404 Transaction Y
7:02:40 404 Transaction Y
7:03:60 801 Login
7:04:15 405 Transaction X
7:04:20 404 Transaction Y
7:05:01 405 Transaction X
7:06:03 409 Transaction X
run ;

proc sort data=have ;
   by Time ;
run ;

data have_cycle ;
   set have ;
   if Transaction='Login' then CycleCount+1 ;
run ;

data want ;
   set have_cycle;
   by CycleCount ;

   retain _XFound _YFound ;
   length Cycle $8 ;

   if first.CycleCount then do ;
       _XFound=0 ;
       _YFound=0 ;
   end ;

   if       Transaction='Transaction X' then _XFound=1 ;      
   else if  Transaction='Transaction Y' then _YFound=1 ;

  if last.CycleCount then do ;
       if      _XFound then Cycle='X Cycle' ;   
       else if _YFound then Cycle='Y Cycle' ;
   end ;

   drop _: CycleCount ;
run ;

hashman · Posted 06-26-2019 12:40 AM

@feliperczx :

The peculiarity of your task basically lies in the fact that you need to flag each "end-of-group" record whose end-of-group marker (Login) is stored in the next record. It means that some look-ahead method has to be deployed, which is why the fine solutions by @seemiyah and @andreas_lds require extra steps and rather involved logic. Methinks it's simpler to address the look-ahead need by using the shifted (offset) merge tactic, for example:

data have ;                                                              
   input time: hhmmss. code: $3. trans: $1. ;                            
   format Time time8. ;                                                  
cards ;                                                                  
7:00:30 801 L                                                            
7:00:35 405 X                                                            
7:00:40 404 Y                                                            
7:02:30 801 L                                                            
7:02:35 404 Y                                                            
7:02:40 404 Y                                                            
7:03:60 801 L                                                            
7:04:15 405 X                                                            
7:04:20 404 Y                                                            
7:05:01 405 X                                                            
7:06:03 409 X                                                            
run ;                                                                    
                                                                         
data want (drop = _:) ;                                                  
  do until (_break) ;                                                    
    merge have have (firstobs=2 keep=trans rename=(trans=_t)) end = lr ;
    if trans = "X" then _X = 1 ;                                         
    _break = (_t = "L" or lr) ;                                         
    if _break then cycle = put (ifC (_X, "X", "Y"), $1.) ;               
    output ;                                                             
  end ;                                                                  
run ;

Kind regards

Paul D.

feliperczx · Posted 06-26-2019 08:38 AM

Perfect, dude! Thank you so much!

feliperczx · Posted 06-28-2019 02:36 PM

hashman, how would you count the time between Logins? to determine the duration of the cycle in minutes

hashman · Posted 06-28-2019 05:10 PM

@feliperczx :

One way would be:

data have ;                                                                            
   input time: hhmmss. code: $3. trans: $1. ;                                          
   format Time time8. ;                                                                
cards ;                                                                                
7:00:30 801 L                                                                          
7:00:35 405 X                                                                          
7:00:40 404 Y                                                                          
7:02:30 801 L                                                                          
7:02:35 404 Y                                                                          
7:02:40 404 Y                                                                          
7:03:60 801 L                                                                          
7:04:15 405 X                                                                          
7:04:20 404 Y                                                                          
7:05:01 405 X                                                                          
7:06:03 409 X                                                                          
run ;                                                                                  
                                                                                       
data want (drop = _:) ;                                                                
  do _q = 1 by 1 until (_break) ;                                                      
    merge have have (firstobs=2 keep=time trans rename=(time=_tm trans=_t)) end = lr ; 
    if trans = "X" then _X = 1 ;                                                       
    if _q = 1 then _tm1 = time ;                                                       
    _break = (_t = "L" or lr) ;                                                        
    if _break then do ;                                                                
      cycle = put (ifC (_X, "X", "Y"), $1.) ;                                          
      if lr then cycle_min = divide (time - _tm1, 60) ;                                
      else       cycle_min = divide (_tm  - _tm1, 60) ;                                
    end ;                                                                              
    output ;                                                                           
  end ;                                                                                
run ;

It assumes that for the last record, you want the difference between the time on this record and the time on the preceding record where trans="L". However, truth be told, I'd rather set cycle_min for the last record to a special missing value (for example, .U) to indicate that since for the last record the beginning of the next cycle is unknown, the cycle time is also undermined. If so, just replace the IF LR condition with:

      if lr then cycle_min = .U ;

Kind regards

Paul D.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away