BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
acfarrer
Quartz | Level 8

This could be a common scenario for unix admins. For each user process, we have separate daily log files identified by date and PID. In common with many busy systems, the PID is reused every 1-2 months. I am trying to create a unique ID for  each file created by the user process and a sequence number.

 

For future proofing, I want to avoid proc sql monotonic() and data step lag() . I think retain will still be supported in Viya.

 

datepid (=(startdt * 1e5) + pid) is unique and initflag, endflag are usually reliable. We are trying to assign sessid and fileseq.

 

Below is what we want for pid = 58123 . This pattern could occur every 1-2 months for any PID.
                     
datepid        startdt  pid    initflag endflag sessid           logfileseq

2020102058123  20201020 58123  Y                2020102158123    1
2020102158123  20201021 58123                   2020102158123    2
2020102258123  20201022 58123           Y       2020102158123    3
2020110558123  20201105 58123  Y        Y       2020110558123    1
2021010258123  20210102 58123  Y                2021010258123    1
2021010858123  20210108 58123           Y       2021010258123    2

If PID was unique, this would work:

 

proc sort data = logfiles ;
by pid ;
 
data logfileseq ;
set logflles ;
by pid ;
retain sessid ;
if first.pid then do ;
  logfileseq =1 ;
  sessid = datepid ;
end ;
else logfileseq +1 ;
run ;
 
In case InitFlag and EndFlag are missing, I would like to include that max(startdt) - min(startdt) < 25 by pid is usually the same sessid.
 
Ideally, we would use ANSI SQL but logfileseq is a challenge.
 
 
 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
acfarrer
Quartz | Level 8

All my testing used first.pid which I was reluctant to abandon. After some further data validation, InitFlag looks reliable and the logic is simpler:

 

proc sort data = logfiles ;
by pid startdt ;
 
data logfileseq ;
set logflles ;
retain sessid ;
if InitFlag = 'Y' then do ;
  logfileseq =1 ;
  sessid = datepid ;
end ;
else logfileseq +1 ;
run ;
 
For further validation, I will create a table of datediff = max(startdt) - min(startdt) by sessid, datepid and pid.
 
This can be marked as solved for now.

View solution in original post

4 REPLIES 4
ballardw
Super User

I am not sure I see a clear rule for when the incrementing is supposed to stop/ restart.

Is it supposed to be "if Endflag='Y' then reset counter?

 

It might help to provide examples where the different cases you are concerned with occur and demonstrate the result.

 

Here is an example data step that you can add values to so we have something we can test code with.

data have;
   input datepid :$14. startdt :yymmdd10.8  pid $6.   initflag :$1. endflag :$1.  ;
   format startdt yymmddn8.;
datalines; 
2020102058123  20201020 58123  Y  .        
2020102158123  20201021 58123  .  .         
2020102258123  20201022 58123  .  Y       
2020110558123  20201105 58123  Y  Y       
2021010258123  20210102 58123  Y  .        
2021010858123  20210108 58123  .  Y       
;

Your rule for the 25 days would likely involve at the first of some group setting the date value to a temporary retained variable that gets tested along with with your orther end rule using the INTCK function and the current date value.

acfarrer
Quartz | Level 8
Yes, reset counter when EndFlag = 'Y'. Sorry for not specifying.
'group by pid having max(startdt) - min(startdt) < 25' works in SQL but I am not sure of the equivalent data step logic. I am trying to avoid retained values if possible.
acfarrer
Quartz | Level 8

All my testing used first.pid which I was reluctant to abandon. After some further data validation, InitFlag looks reliable and the logic is simpler:

 

proc sort data = logfiles ;
by pid startdt ;
 
data logfileseq ;
set logflles ;
retain sessid ;
if InitFlag = 'Y' then do ;
  logfileseq =1 ;
  sessid = datepid ;
end ;
else logfileseq +1 ;
run ;
 
For further validation, I will create a table of datediff = max(startdt) - min(startdt) by sessid, datepid and pid.
 
This can be marked as solved for now.
acfarrer
Quartz | Level 8

Without supplying the real data, I am not sure how much is needed but I have added PIDs 36494 and 63423 to this sample. The natural order is usually by datepid:

datepid        startdt  pid    initflag endflag sessid           logfileseq

2020102036494  20201020 36494  Y        Y       2020102036494    1
2020102058123  20201020 58123  Y                2020102158123    1
2020102063423  20201020 63423  Y                2020102063423    1
2020102158123  20201021 58123                   2020102158123    2
2020102163423  20201021 63423           Y       2020102063423    2
2020102258123  20201022 58123           Y       2020102158123    3
2020110558123  20201105 58123  Y        Y       2020110558123    1
2021010258123  20210102 58123  Y                2021010258123    1
2021010858123  20210108 58123           Y       2021010258123    2
2021010863423  20210108 63423  Y        Y       2021010863423    1

My latest version has proc sort ; by pid startdt ; to assign logfileseq correctly.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1439 views
  • 0 likes
  • 2 in conversation