BookmarkSubscribeRSS Feed
ajs_rdg
Fluorite | Level 6

Here's a simple program that uses pattern matching to extract fields from lines in a log file.

It gets the date and time fields OK.

Why doesn't it get the number of seconds from "startup took 204 seconds"? What am I missing?

data;

  if (_n_=1) then rxid1 = prxparse('/^INFO   \| jvm 1    \| (\d{4}?\/\d{2}?\/\d{2}?) (\d{2}?:\d{2}?:\d{2}?) \| ==> ready to accept requests \(startup took (\d*) seconds\)\./');

  input;

  if (prxmatch(rxid1, _infile_)) then do;

    date        = input(prxposn(rxid1, 1, _infile_), yymmdd10.);

    time        = input(prxposn(rxid1, 2, _infile_), time.);

    string      = prxposn(rxid1, 3, _infile_);

    seconds     = input(prxposn(rxid1, 3, _infile_), best.);

  end;

  format date yymmdd10. time time.;

  retain rxid1;

  drop   rxid1;

  datalines;

INFO   | jvm 1    | 2014/05/23 05:19:43 | ==> ready to accept requests (startup took 204 seconds).

INFO   | jvm 1    | 2014/05/27 18:14:20 | ==> ready to accept requests (startup took 637 seconds).

run;

options nocenter;

proc print;

run;

Output is:

Obs          date      time      string    seconds

1     2014-05-23     5:19:43                 .  

2     2014-05-27    18:14:20                 .  

Pattern Matching Using Perl Regular Expressions (PRX)

http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#n13as9vjfj7...

7 REPLIES 7
data_null__
Jade | Level 19

_INFILE_ and CARDS sometimes truncates _INFILE_.  Use PARMCARDS for a real file INFILE.

filename FT15F001 temp;
data rx;
  infile FT15F001;
  if (_n_=1) then rxid1 = prxparse('/^INFO   \| jvm 1    \| (\d{4}?\/\d{2}?\/\d{2}?) (\d{2}?:\d{2}?:\d{2}?) \| ==> ready to accept requests \(startup took (\d*) seconds\)\./');
  input;
 
if (prxmatch(rxid1, _infile_)) then do;
    date        = input(prxposn(rxid1,
1, _infile_), yymmdd10.);
    time        = input(prxposn(rxid1, 2, _infile_), time.);
    string      = prxposn(rxid1, 3, _infile_);
    seconds     = input(prxposn(rxid1, 3, _infile_), best.);
  end;
 
format date yymmdd10. time time.;
 
retain rxid1;
  drop   rxid1;
  parmcards;
INFO   | jvm 1    | 2014/05/23 05:19:43 | ==> ready to accept requests (startup took 204 seconds).
INFO   | jvm
1    | 2014/05/27 18:14:20 | ==> ready to accept requests (startup took 637 seconds).
run;
options nocenter;
proc print;
run;

Obs          date      time      string    seconds

1     2014-05-23     5:19:43     204        204 
2     2014-05-27    18:14:20     637        637 
ajs_rdg
Fluorite | Level 6

Thank you, data_null_; !

But hey -- NO FAIR, SAS! That is really insidious, and not even a warning in the SAS log about input being truncated.

And if the input line is truncated, I don't understand how it matched the pattern anyway. Bizarre, no?

Tried documented option NOCARDIMAGE, and it made no difference at all. Does not do what it says on the tin.

DATALINES Statement

Reading Long Data Lines

If you use NOCARDIMAGE, SAS processes data lines longer than 80 columns in their entirety.

http://support.sas.com/documentation/cdl/en/lestmtsref/67175/HTML/default/viewer.htm#p0114gachtut3nn...

Using PARMCARDS as you showed worked for me. Or simply reading the data from an external file.

Good thing you _showed_ how to use the PARMCARDS statement, as I can't even find any documentation on it.

The documentation on the PARMCARDS= System Option alludes to a PARMCARDS statement ("in a procedure"), and there the trail goes cold.

http://support.sas.com/documentation/cdl/en/lesysoptsref/64892/HTML/default/viewer.htm#p0bwycx4em8cr...

data_null__
Jade | Level 19

The truncation depends on what you do to it.  This works with CARDS by assigning _INFILE_ to a new variable.  The advantage of using PARMCARDS is that SAS writes the data to a FILE and when you INFILE it is just like any file you might have created.  CARDS is un-buffered and you have to use different options to deal with end-of-file detection.  That's one difference I can think of I'm sure there are others.


data rx;
  infile cards;
 
if (_n_=1) then rxid1 = prxparse('/^INFO   \| jvm 1    \| (\d{4}?\/\d{2}?\/\d{2}?) (\d{2}?:\d{2}?:\d{2}?) \| ==> ready to accept requests \(startup took (\d*) seconds\)\./');
  input;
 
infile = _infile_;
  if (prxmatch(rxid1, infile)) then do;
    date        = input(prxposn(rxid1,
1, infile), yymmdd10.);
    time        = input(prxposn(rxid1, 2, infile), time.);
    string      = prxposn(rxid1, 3, infile);
    seconds     = input(prxposn(rxid1, 3, infile), best.);
  end;
 
format date yymmdd10. time time.;
 
retain rxid1;
  drop   rxid1;
  cards;
INFO   | jvm 1    | 2014/05/23 05:19:43 | ==> ready to accept requests (startup took 204 seconds).
INFO   | jvm 1    | 2014/05/27 18:14:20 | ==> ready to accept requests (startup took 637 seconds).
run;
options nocenter;
proc print;
run;
ajs_rdg
Fluorite | Level 6

Golly. That is weird.

I think a lot of fair-minded people would call this a bug.

PGStats
Opal | Level 21

No, no, no! That's an undocumented feature! Smiley Happy

One thing to note however : a data lines section should be terminated with a semicolon in the first column, not with a run statement.

- PG

PG
Haikuo
Onyx | Level 15

A bit OT, and I don't have the details of your business requirements, but if your purpose is to capture those 3 buffers, then it seems to me that a generic approach could be more robust:

if (_n_=1) then rxid1 = prxparse('/\D+ (\d{4}?\/\d{2}?\/\d{2}?) (\d{2}?:\d{2}?:\d{2}?) \D+(\d*)/');

Regards,

Haikuo

Ksharp
Super User

Why are you using Peal Regular Expression in such bad way. It is over-killed for your situation.

data x;
  input;
    date        = input(scan(scan( _infile_,3,'|'),1,' ') , yymmdd10.);
    time        = input(scan(scan( _infile_,3,'|'),-1,' '), time.);
    seconds     = input(scan(_infile_,-2,' '), best.);
  format date yymmdd10. time time.;
  datalines;
INFO   | jvm 1    | 2014/05/23 05:19:43 | ==> ready to accept requests (startup took 204 seconds).
INFO   | jvm 1    | 2014/05/27 18:14:20 | ==> ready to accept requests (startup took 637 seconds).
run;

Xia Keshan

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1664 views
  • 1 like
  • 5 in conversation