BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jerry898969
Pyrite | Level 9

I have a log file that i need to calculate the time span across rows.

I first have to find a row that finds the In and the out and take those two timestamps and minus them.

The data is all in one character variable at the moment in my sas table.

log_row

"01/01/2010 7:42:25:443 PM","In field:Sec1.tab1.col1"

.

.

.

"01/01/2010 7:42:36:129 PM","Out field:Sec1.tab1.col1"

I need my end result to be

Field                 Duration

Sec1.tab1.col1  0:00:10:686

I'm not sure where to start with the datetime since it is a character string. Any help would be appreciated.

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Both datetime and time values are stored as the number of seconds. So the trick of adding '12:00't to add 12 hours will work the same for a time variable as it does for a datetime variable. If time='12:00't and ampm='PM' then do NOT add the 12 hours.

if time < '12:00't and ampm='PM' then time+'12:00't ;

It is easier to calculate differences that might cross midnight if you convert your date and time variables into a datetime value.  To calculate the difference from SDATE/STIME to EDATE/ETIME you could use this formula.

diff = dhms(edate,0,0,etime) - dhms(sdate,0,0,stime) ;

View solution in original post

18 REPLIES 18
Astounding
PROC Star

jerry,

There will still be many details to take care of ... assigning intelligent lengths to variables, converting character strings to numerics, and actually making the calculations.  But here is how you can retrieve the pieces from LOG_ROW.

field = scan(log_row, ':"', -1);

sasdate_char = scan(log_row, 1, ' "');

hours_char = scan(log_row, 2, ' " :');

mins_char = scan(log_row, 3, ' ":');

secs_char = scan(log_row, 4, ' ":');

thsnds_char = scan(log_row, 5, ' ":');

am_pm = scan(log_row, 6, ' ":');

Depending on how you end up making the calculations, it may be easier to read the time into fewer then 4 variables.  Or perhaps to read in the entire date/time into a single variable:

datetime_char = scan(log_row, 1, '"');

Good luck.

jerry898969
Pyrite | Level 9

Astounding,

Thank you so much for your reply. 

Can I bring in the data with an infile statement without knowing how many variables there will be?  Some rows have 2-6 comma's.

The reason I ask is wouldn't it be better if I bring the data in with dlm=',' and that way I have the date time in it's own variable?

And then use your code to parse out the datetime.  Sorry if i'm introducing another question. 

thank you again for your help

Astounding
PROC Star

jerry,

Without question, you can use an INFILE statement.  Parsing the line is the harder part.

If you have additional commas, what is in the rest of the line?  Do you ignore it, or is it additional instances of Date + Section that you need to read in?  If you have two instances of Date + Section on the same line, how many commas are there (2 or 3)?

jerry898969
Pyrite | Level 9

Astounding,

This file seems to very per row.  Other than the datetime variable I'm not sure what each of the other fields stands for. 

I was given the file and told to find certain strings and then take those rows datetime and figure out the duration a user was in that field.

I'm not sure what types each variable would be either but the file has double quotes and then a comma around each field.  Some rows have 2-6 fields.  I'm not sure how to set the input statement up for that. 

Thanks again for your help and any other ideas would be apperciated.

Astounding
PROC Star

jerry,

I just need to make sure I understand the conditions, so feel free to correct or comment about these assumptions.

You could have up to 6 quoted strings per line.

Any one of the 6 could contain the date and time.

Any one of the 6 could contain the Field.

If a quoted string contains the date and time, no other quoted string contains a date and time.

If a quoted string contains "In field:" or "Out field:", no other quoted string contains that phrase.

These assumptions may not be correct, but you can see where I'm heading.  There has to be some feature of a quoted string that tells you that it contains part of the solution.

jerry898969
Pyrite | Level 9

Astounding,

This is correct you could have up to 6 quoted strings per line.

The date and time is always the first field. At least in the files i've looked at so far

I'm not sure what you mean "Any one of the 6 could contain the Field."

There should be no other date and time.

This should be correct as well.  If a quoted string contains "In field:" or "Out field:", no other quoted string contains that phrase.

Another quick question is there a way to convert the entire date time to be one numeric field?

ex.  1/1/2010 5:12:26:106 PM

Thank you so much

Astounding
PROC Star

OK, here's a framework with some pseudo-code:

data IN (keep= a list of appropriate variables) OUT (keep=a list of appropriate variables);

infile somedata;

input;

date_plus_time = scan(_infile_, 1, '"');

do i=2 to 11;

   quoted_string = scan(_infile_, i, '"');

   if index(quoted_string, 'In field:') then do;

      field = scan(quoted_string, ':', -1);

      * break down date and time into appropriate date_time_IN variables;

      output IN;

   end;

   else if index(quoted_string, 'Out field:') then do;

      field = scan(quoted_string, ':', -1);

      output OUT;

      * break down date and time into appropriate date_time_OUT variables;

   end;

end;

This should give you data sets that you can sort and merge by FIELD, and perform calculations.  It does rely on the date+time being in the first quoted string on each line.  If you don't feel certain that there will be a maximum of 6 quoted strings per line, increase the upper limit of the DO loop.  The current limit of 11 allows for 6 quoted strings, separated by 5 commas.

Good luck.

Haikuo
Onyx | Level 15

I know someone will come up better ideas, since I could not figure out how to let SAS take the last 3 digits in whole of your datetime strings. For me, SAS only takes in hh:mm:ss or hh:mm:ss.ss, but not hh:mm:ss.sss.

Maybe I should have broke them up.

Anyway, here is what can come up with, sorry I can't get you the precision you want:

data have;

input logtime $60.;

cards;

"01/01/2010 7:42:25:222 PM","In field:Sec1.tab1.col1"

"01/01/2010 7:42:36:222 PM","Out field:Sec1.tab1.col1"

;

data have1;

set have;

dt1=scan(compress(logtime,'"'),1,',');

substr(dt1,19,4)='';

dt2=input(dt1,mdyampm25.2);

cat1=scan(compress(logtime,'"'),2,',');

cat2=substr(cat1,find(cat1,'Sec'));

put cat2;

run;

proc sql;

create table want as

select cat2 as field , max(dt2)-min(dt2) as duration format=time8.

from have1

group by field;

quit;

proc print;run;

Haikuo

ballardw
Super User

I haven't time to write the whole code but one key to this sort of problem is the automatic variable _INFILE_ to read the input.

Data want;

     infile <your input file ref> lrecl=<large enough number> pad;

     if scan(_infile_,2,',') = "In Field" then do;

          <parse _infile_ for the parts you want>

          output;

     end;

     if scan(_infile_,2,',') = "Out Field" then do;

          <parse _infile_ for the parts you want>

          output;

     end;

run;

Tom
Super User Tom
Super User

You have mulitple issues to work through here.  First you have to parse the string.  Then you have to interpret the values in the string. And finally you need to figure out how to match the IN/OUT timestamps records so that you can calculate the duration.

You can use SCAN() function to do a lot of the parsing.

dtchar=scan(log_row,1,',','QR');

field=scan(log_row,2,',','QR');

inout=scan(field,1,' ');

field=scan(field,-1,':');

To convert the date time stamp to an actual date time you will need to define what that format means. First does the date part have the month or the day first?  Second what does that three digit number appended to the second field of the time stamp mean?  Is that milliseconds?  Because of this you will probably need to parse the time stamp into pieces to be able to convert it.

Once you have converted the time stamps into actual SAS datetime variables then you probably want to sort the data by FIELD,DT,INOUT and then you can process it to get the duration.  Are you likely to have multiple In/Out pairs for the same value of FIELD?  If so then your logic will need to keep track of this.

Ksharp
Super User

There is a problem.

If there are several IN or OUT for a same FIELD, what you are going to do?

data x;
input x & $200. ;
datalines;
"01/01/2010 7:42:25:443 PM","In field:Sec1.tab1.col1"
.
.
.
"01/01/2010 7:42:36:129 PM","Out field:Sec1.tab1.col1"
"01/01/2010 7:42:25:443 PM","In field:Sec1.tab1.col2"
.
.
.
"01/01/2010 7:42:36:129 PM","Out field:Sec1.tab1.col2"
;
run;
option datestyle=mdy;

data x;
 set x(where=(x is not missing));
 field=scan(x,-2,':"');
 index=ifn(index(x,'In field'),1,2);
 datetime=input(catx(':',scan(x,1,'" :'),scan(x,2,'" :'),scan(x,3,'" :'),scan(x,4,'" :')),anydtdtm20.);
 format datetime datetime.;
run; 
proc sort data=x nodupkey;by field index;run;
data temp;
 set x;
 by field;
 if not (first.field and last.field);
run;
data want(keep=field duration);
 set temp;
 duration=dif(datetime);
 if mod(_n_,2)=0 ;
 format duration time.;
run;



Ksharp

jerry898969
Pyrite | Level 9

Ksharp,


Thank you for your reply.  I made a mistake by putting the "." within my test data.  Those were suppoes to represent rows that had other data but not having "In" or "Out".  I apologize for making that assumption.

jerry898969
Pyrite | Level 9

Thank you everyone for your help. 

I thought I had it but ran into an issue with duration portion. 

Say I have

"01/01/2010 7:42:25:443 PM","In field:Sec1.tab1.col1"

"01/02/2010 7:42:36:129 AM","Out field:Sec1.tab1.col1"

This should say 12:00:11 but it currently says 24:0011

What is the best way to deal with AM and PM?

Thank you

Tom
Super User Tom
Super User

Here is an example.  You will probably need to adjust the PM rule to properly handle midnight and noon. 12:00 AM should be 00:00 and 12:00PM should be 12:00 (not 24:00).

Also are these dates from January 1st and 2nd or January 1st and February 1st?

891  data zz;

892    input dt anydtdtm18. +1 ms ampm $;

893    dt+ms/1000;

894    if ampm='PM' then dt+'12:00't ;

895    date=datepart(dt);

896    time=timepart(dt);

897    diff=dif(dt);

898    format dt datetime24.3 date date9. time diff time12.3;

899    put _infile_ / (_all_) (=) /;

900  *---+----10---+----20---+----30---+;

901  cards;

01/01/2010 7:42:25:443 PM

dt=01JAN2010:19:42:25.443 ms=443 ampm=PM date=01JAN2010 time=19:42:25.443 diff=.

01/02/2010 7:42:36:129 AM

dt=02JAN2010:07:42:36.129 ms=129 ampm=AM date=02JAN2010 time=7:42:36.129 diff=12:00:10.686

Updated example to include difference calculation.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 2578 views
  • 6 likes
  • 6 in conversation