DATA Step, Macro, Functions and more

Pattern Matching Using Perl Regular Expressions (PRX)

Reply
Occasional Contributor
Posts: 10

Pattern Matching Using Perl Regular Expressions (PRX)

Here's a simple program that uses pattern matching to extract fields from lines in a log file.

It gets the date and time fields OK.

Why doesn't it get the number of seconds from "startup took 204 seconds"? What am I missing?

data;

  if (_n_=1) then rxid1 = prxparse('/^INFO   \| jvm 1    \| (\d{4}?\/\d{2}?\/\d{2}?) (\d{2}?:\d{2}?:\d{2}?) \| ==> ready to accept requests \(startup took (\d*) seconds\)\./');

  input;

  if (prxmatch(rxid1, _infile_)) then do;

    date        = input(prxposn(rxid1, 1, _infile_), yymmdd10.);

    time        = input(prxposn(rxid1, 2, _infile_), time.);

    string      = prxposn(rxid1, 3, _infile_);

    seconds     = input(prxposn(rxid1, 3, _infile_), best.);

  end;

  format date yymmdd10. time time.;

  retain rxid1;

  drop   rxid1;

  datalines;

INFO   | jvm 1    | 2014/05/23 05:19:43 | ==> ready to accept requests (startup took 204 seconds).

INFO   | jvm 1    | 2014/05/27 18:14:20 | ==> ready to accept requests (startup took 637 seconds).

run;

options nocenter;

proc print;

run;

Output is:

Obs          date      time      string    seconds

1     2014-05-23     5:19:43                 .  

2     2014-05-27    18:14:20                 .  

Pattern Matching Using Perl Regular Expressions (PRX)

http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#n13as9vjfj7...

Respected Advisor
Posts: 3,799

Re: Pattern Matching Using Perl Regular Expressions (PRX)

_INFILE_ and CARDS sometimes truncates _INFILE_.  Use PARMCARDS for a real file INFILE.

filename FT15F001 temp;
data rx;
  infile FT15F001;
  if (_n_=1) then rxid1 = prxparse('/^INFO   \| jvm 1    \| (\d{4}?\/\d{2}?\/\d{2}?) (\d{2}?:\d{2}?:\d{2}?) \| ==> ready to accept requests \(startup took (\d*) seconds\)\./');
  input;
 
if (prxmatch(rxid1, _infile_)) then do;
    date        = input(prxposn(rxid1,
1, _infile_), yymmdd10.);
    time        = input(prxposn(rxid1, 2, _infile_), time.);
    string      = prxposn(rxid1, 3, _infile_);
    seconds     = input(prxposn(rxid1, 3, _infile_), best.);
  end;
 
format date yymmdd10. time time.;
 
retain rxid1;
  drop   rxid1;
  parmcards;
INFO   | jvm 1    | 2014/05/23 05:19:43 | ==> ready to accept requests (startup took 204 seconds).
INFO   | jvm
1    | 2014/05/27 18:14:20 | ==> ready to accept requests (startup took 637 seconds).
run;
options nocenter;
proc print;
run;

Obs          date      time      string    seconds

1     2014-05-23     5:19:43     204        204 
2     2014-05-27    18:14:20     637        637 
Occasional Contributor
Posts: 10

Re: Pattern Matching Using Perl Regular Expressions (PRX)

Posted in reply to data_null__

Thank you, data_null_; !

But hey -- NO FAIR, SAS! That is really insidious, and not even a warning in the SAS log about input being truncated.

And if the input line is truncated, I don't understand how it matched the pattern anyway. Bizarre, no?

Tried documented option NOCARDIMAGE, and it made no difference at all. Does not do what it says on the tin.

DATALINES Statement

Reading Long Data Lines

If you use NOCARDIMAGE, SAS processes data lines longer than 80 columns in their entirety.

http://support.sas.com/documentation/cdl/en/lestmtsref/67175/HTML/default/viewer.htm#p0114gachtut3nn...

Using PARMCARDS as you showed worked for me. Or simply reading the data from an external file.

Good thing you _showed_ how to use the PARMCARDS statement, as I can't even find any documentation on it.

The documentation on the PARMCARDS= System Option alludes to a PARMCARDS statement ("in a procedure"), and there the trail goes cold.

http://support.sas.com/documentation/cdl/en/lesysoptsref/64892/HTML/default/viewer.htm#p0bwycx4em8cr...

Respected Advisor
Posts: 3,799

Re: Pattern Matching Using Perl Regular Expressions (PRX)

The truncation depends on what you do to it.  This works with CARDS by assigning _INFILE_ to a new variable.  The advantage of using PARMCARDS is that SAS writes the data to a FILE and when you INFILE it is just like any file you might have created.  CARDS is un-buffered and you have to use different options to deal with end-of-file detection.  That's one difference I can think of I'm sure there are others.


data rx;
  infile cards;
 
if (_n_=1) then rxid1 = prxparse('/^INFO   \| jvm 1    \| (\d{4}?\/\d{2}?\/\d{2}?) (\d{2}?:\d{2}?:\d{2}?) \| ==> ready to accept requests \(startup took (\d*) seconds\)\./');
  input;
 
infile = _infile_;
  if (prxmatch(rxid1, infile)) then do;
    date        = input(prxposn(rxid1,
1, infile), yymmdd10.);
    time        = input(prxposn(rxid1, 2, infile), time.);
    string      = prxposn(rxid1, 3, infile);
    seconds     = input(prxposn(rxid1, 3, infile), best.);
  end;
 
format date yymmdd10. time time.;
 
retain rxid1;
  drop   rxid1;
  cards;
INFO   | jvm 1    | 2014/05/23 05:19:43 | ==> ready to accept requests (startup took 204 seconds).
INFO   | jvm 1    | 2014/05/27 18:14:20 | ==> ready to accept requests (startup took 637 seconds).
run;
options nocenter;
proc print;
run;
Occasional Contributor
Posts: 10

Re: Pattern Matching Using Perl Regular Expressions (PRX)

Posted in reply to data_null__

Golly. That is weird.

I think a lot of fair-minded people would call this a bug.

Respected Advisor
Posts: 4,926

Re: Pattern Matching Using Perl Regular Expressions (PRX)

No, no, no! That's an undocumented feature! Smiley Happy

One thing to note however : a data lines section should be terminated with a semicolon in the first column, not with a run statement.

- PG

PG
Respected Advisor
Posts: 3,156

Re: Pattern Matching Using Perl Regular Expressions (PRX)

A bit OT, and I don't have the details of your business requirements, but if your purpose is to capture those 3 buffers, then it seems to me that a generic approach could be more robust:

if (_n_=1) then rxid1 = prxparse('/\D+ (\d{4}?\/\d{2}?\/\d{2}?) (\d{2}?:\d{2}?:\d{2}?) \D+(\d*)/');

Regards,

Haikuo

Super User
Posts: 10,035

Re: Pattern Matching Using Perl Regular Expressions (PRX)

Why are you using Peal Regular Expression in such bad way. It is over-killed for your situation.

data x;
  input;
    date        = input(scan(scan( _infile_,3,'|'),1,' ') , yymmdd10.);
    time        = input(scan(scan( _infile_,3,'|'),-1,' '), time.);
    seconds     = input(scan(_infile_,-2,' '), best.);
  format date yymmdd10. time time.;
  datalines;
INFO   | jvm 1    | 2014/05/23 05:19:43 | ==> ready to accept requests (startup took 204 seconds).
INFO   | jvm 1    | 2014/05/27 18:14:20 | ==> ready to accept requests (startup took 637 seconds).
run;

Xia Keshan

Ask a Question
Discussion stats
  • 7 replies
  • 314 views
  • 1 like
  • 5 in conversation