Hi guys!
So, basically what I want to do is to identify certain patterns (cycles actually) of users using a machine.
I need to use a IF-THEN statement between two specific transactions.
The code needs to run from one Login to the next one and use the code of transactions between to identify the cycles.
TIME CODE TRANSACTION
7:00:30 801 Login
7:00:35 405 Transaction X
7:00:40 404 Transaction Y
7:02:30 801 Login
7:02:35 404 Transaction Y
7:02:40 404 Transaction Y
The output should be: 1 X Cycle and 1 Y Cycle.
Also, Transaction X is stronger than Transaction Y, then, if X occurs, the cycle is identified as a ''X Cycle''
If not, we have a ''Y Cycle''.
Any ideas? Thank you so much!
The peculiarity of your task basically lies in the fact that you need to flag each "end-of-group" record whose end-of-group marker (Login) is stored in the next record. It means that some look-ahead method has to be deployed, which is why the fine solutions by @seemiyah and @andreas_lds require extra steps and rather involved logic. Methinks it's simpler to address the look-ahead need by using the shifted (offset) merge tactic, for example:
data have ;
input time: hhmmss. code: $3. trans: $1. ;
format Time time8. ;
cards ;
7:00:30 801 L
7:00:35 405 X
7:00:40 404 Y
7:02:30 801 L
7:02:35 404 Y
7:02:40 404 Y
7:03:60 801 L
7:04:15 405 X
7:04:20 404 Y
7:05:01 405 X
7:06:03 409 X
run ;
data want (drop = _:) ;
do until (_break) ;
merge have have (firstobs=2 keep=trans rename=(trans=_t)) end = lr ;
if trans = "X" then _X = 1 ;
_break = (_t = "L" or lr) ;
if _break then cycle = put (ifC (_X, "X", "Y"), $1.) ;
output ;
end ;
run ;
Kind regards
Paul D.
Why not use login auditing tools to deterring the amount of time the user is logged in?
take the logged in time and create spans until the next login.
The amount of time the user is logged in it is not important here.
The LOGIN transaction is just a flag to let me know that another cycle began/ended
What I really need to do is to identify the transactions between those LOGIN transactions
not sure If i get your suggestion...could you give me an example of what you thought?
Does the following help? It assumes that 'Login' or end-of-file mark the end of a cycle.
data have ;
input Time : hhmmss. Code : $3. Transaction & $20. ;
format Time time8. ;
cards ;
7:00:30 801 Login
7:00:35 405 Transaction X
7:00:40 404 Transaction Y
7:02:30 801 Login
7:02:35 404 Transaction Y
7:02:40 404 Transaction Y
;
run ;
proc sort data=have;
by time ;
run ;
data want ;
set have end=eof;
retain _XFound
_YFound ;
if _n_>1 and (Transaction='Login' or eof) then do ;
if _XFound then XCycleCount+1 ;
else if _YFound then YCycleCount+1 ;
_XFound=0 ;
_YFound=0 ;
end ;
if Transaction='Transaction X' then _XFound=1 ;
else if Transaction='Transaction Y' then _YFound=1 ;
if eof then output ;
keep XCycleCount YCycleCount ;
run ;
Do you need the information in the log or in a dataset? If you need a dataset, please post some details on the expected structure.
Is the variable "code" numeric or alphanumeric? The value of "code" seems to be related to the value of "transaction", is this the case or just a coincidence?
Here is one possible solution:
data have;
attrib
time length=8 format=time8. informat=time.
code length=8
transaction length=$30
;
infile datalines dlmstr=' ';
input time code transaction;
datalines;
7:00:30 801 Login
7:00:35 405 Transaction X
7:00:40 404 Transaction Y
7:02:30 801 Login
7:02:35 404 Transaction Y
7:02:40 404 Transaction Y
;
run;
data _null_;
set have end=jobDone;
length x_cycles y_cycles current_cycle 8;
retain x_cycles y_cycles 0 current_cycle .;
array cycles[404:405] y_cycles x_cycles;
if code = 801 then do;
if not missing(current_cycle) then do;
cycles[current_cycle] = cycles[current_cycle] + 1;
current_cycle = .;
end;
end;
else do;
current_cycle = max(current_cycle, code);
end;
if jobDone then do;
cycles[current_cycle] = cycles[current_cycle] + 1;
put x_cycles "X cycle and " y_cycles "Y cycle";
end;
run;
I need the information on a dataset.
Would be great If I had a new column/variable (CYCLE) and at the end of each cycle, a classification for the prior one.
For example, expected structure:
TIME CODE TRANSACTION CYCLE
7:00:30 801 Login
7:00:35 405 Transaction X
7:00:40 404 Transaction Y X Cycle
7:02:30 801 Login
7:02:35 404 Transaction Y
7:02:40 404 Transaction Y Y Cycle
7:03:60 801 Login
7:04:15 405 Transaction X
7:04:20 404 Transaction Y
7:05:01 405 Transaction X
7:06:03 409 Transaction X X Cycle
The variable CODE is a character variable. The value of 'CODE' it is indeed related to the value of 'TRANSACTION', but I do that classification in a prior step, for example:
IF CODE = '801' THEN TRANSACTION = 'Login';
ELSE IF CODE IN ('405','409') THEN TRANSACTION = 'Transaction X';
ELSE TRANSACTION = 'Transaction Y';
If you want a dataset like the example you have provided it may require a two pass solution. First two identify the cycle boundaries and then apply the priority rules.
Here's a possible solution below. You could possibly eliminate the derivation of the Transaction variable and substitute this logic into the datasteps. eg if Code='801' then CycleCount+1.
data have ;
input Time : hhmmss. Code : $3. Transaction & $20. ;
format Time time8. ;
cards ;
7:00:30 801 Login
7:00:35 405 Transaction X
7:00:40 404 Transaction Y
7:02:30 801 Login
7:02:35 404 Transaction Y
7:02:40 404 Transaction Y
7:03:60 801 Login
7:04:15 405 Transaction X
7:04:20 404 Transaction Y
7:05:01 405 Transaction X
7:06:03 409 Transaction X
run ;
proc sort data=have ;
by Time ;
run ;
data have_cycle ;
set have ;
if Transaction='Login' then CycleCount+1 ;
run ;
data want ;
set have_cycle;
by CycleCount ;
retain _XFound _YFound ;
length Cycle $8 ;
if first.CycleCount then do ;
_XFound=0 ;
_YFound=0 ;
end ;
if Transaction='Transaction X' then _XFound=1 ;
else if Transaction='Transaction Y' then _YFound=1 ;
if last.CycleCount then do ;
if _XFound then Cycle='X Cycle' ;
else if _YFound then Cycle='Y Cycle' ;
end ;
drop _: CycleCount ;
run ;
The peculiarity of your task basically lies in the fact that you need to flag each "end-of-group" record whose end-of-group marker (Login) is stored in the next record. It means that some look-ahead method has to be deployed, which is why the fine solutions by @seemiyah and @andreas_lds require extra steps and rather involved logic. Methinks it's simpler to address the look-ahead need by using the shifted (offset) merge tactic, for example:
data have ;
input time: hhmmss. code: $3. trans: $1. ;
format Time time8. ;
cards ;
7:00:30 801 L
7:00:35 405 X
7:00:40 404 Y
7:02:30 801 L
7:02:35 404 Y
7:02:40 404 Y
7:03:60 801 L
7:04:15 405 X
7:04:20 404 Y
7:05:01 405 X
7:06:03 409 X
run ;
data want (drop = _:) ;
do until (_break) ;
merge have have (firstobs=2 keep=trans rename=(trans=_t)) end = lr ;
if trans = "X" then _X = 1 ;
_break = (_t = "L" or lr) ;
if _break then cycle = put (ifC (_X, "X", "Y"), $1.) ;
output ;
end ;
run ;
Kind regards
Paul D.
One way would be:
data have ;
input time: hhmmss. code: $3. trans: $1. ;
format Time time8. ;
cards ;
7:00:30 801 L
7:00:35 405 X
7:00:40 404 Y
7:02:30 801 L
7:02:35 404 Y
7:02:40 404 Y
7:03:60 801 L
7:04:15 405 X
7:04:20 404 Y
7:05:01 405 X
7:06:03 409 X
run ;
data want (drop = _:) ;
do _q = 1 by 1 until (_break) ;
merge have have (firstobs=2 keep=time trans rename=(time=_tm trans=_t)) end = lr ;
if trans = "X" then _X = 1 ;
if _q = 1 then _tm1 = time ;
_break = (_t = "L" or lr) ;
if _break then do ;
cycle = put (ifC (_X, "X", "Y"), $1.) ;
if lr then cycle_min = divide (time - _tm1, 60) ;
else cycle_min = divide (_tm - _tm1, 60) ;
end ;
output ;
end ;
run ;
It assumes that for the last record, you want the difference between the time on this record and the time on the preceding record where trans="L". However, truth be told, I'd rather set cycle_min for the last record to a special missing value (for example, .U) to indicate that since for the last record the beginning of the next cycle is unknown, the cycle time is also undermined. If so, just replace the IF LR condition with:
if lr then cycle_min = .U ;
Kind regards
Paul D.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.