Here's a simple program that uses pattern matching to extract fields from lines in a log file.
It gets the date and time fields OK.
Why doesn't it get the number of seconds from "startup took 204 seconds"? What am I missing?
data;
if (_n_=1) then rxid1 = prxparse('/^INFO \| jvm 1 \| (\d{4}?\/\d{2}?\/\d{2}?) (\d{2}?:\d{2}?:\d{2}?) \| ==> ready to accept requests \(startup took (\d*) seconds\)\./');
input;
if (prxmatch(rxid1, _infile_)) then do;
date = input(prxposn(rxid1, 1, _infile_), yymmdd10.);
time = input(prxposn(rxid1, 2, _infile_), time.);
string = prxposn(rxid1, 3, _infile_);
seconds = input(prxposn(rxid1, 3, _infile_), best.);
end;
format date yymmdd10. time time.;
retain rxid1;
drop rxid1;
datalines;
INFO | jvm 1 | 2014/05/23 05:19:43 | ==> ready to accept requests (startup took 204 seconds).
INFO | jvm 1 | 2014/05/27 18:14:20 | ==> ready to accept requests (startup took 637 seconds).
run;
options nocenter;
proc print;
run;
Output is:
Obs date time string seconds
1 2014-05-23 5:19:43 .
2 2014-05-27 18:14:20 .
Pattern Matching Using Perl Regular Expressions (PRX)
_INFILE_ and CARDS sometimes truncates _INFILE_. Use PARMCARDS for a real file INFILE.
Thank you, data_null_; !
But hey -- NO FAIR, SAS! That is really insidious, and not even a warning in the SAS log about input being truncated.
And if the input line is truncated, I don't understand how it matched the pattern anyway. Bizarre, no?
Tried documented option NOCARDIMAGE, and it made no difference at all. Does not do what it says on the tin.
Reading Long Data Lines
If you use NOCARDIMAGE, SAS processes data lines longer than 80 columns in their entirety.
Using PARMCARDS as you showed worked for me. Or simply reading the data from an external file.
Good thing you _showed_ how to use the PARMCARDS statement, as I can't even find any documentation on it.
The documentation on the PARMCARDS= System Option alludes to a PARMCARDS statement ("in a procedure"), and there the trail goes cold.
The truncation depends on what you do to it. This works with CARDS by assigning _INFILE_ to a new variable. The advantage of using PARMCARDS is that SAS writes the data to a FILE and when you INFILE it is just like any file you might have created. CARDS is un-buffered and you have to use different options to deal with end-of-file detection. That's one difference I can think of I'm sure there are others.
Golly. That is weird.
I think a lot of fair-minded people would call this a bug.
No, no, no! That's an undocumented feature!
One thing to note however : a data lines section should be terminated with a semicolon in the first column, not with a run statement.
- PG
A bit OT, and I don't have the details of your business requirements, but if your purpose is to capture those 3 buffers, then it seems to me that a generic approach could be more robust:
if (_n_=1) then rxid1 = prxparse('/\D+ (\d{4}?\/\d{2}?\/\d{2}?) (\d{2}?:\d{2}?:\d{2}?) \D+(\d*)/');
Regards,
Haikuo
Why are you using Peal Regular Expression in such bad way. It is over-killed for your situation.
data x; input; date = input(scan(scan( _infile_,3,'|'),1,' ') , yymmdd10.); time = input(scan(scan( _infile_,3,'|'),-1,' '), time.); seconds = input(scan(_infile_,-2,' '), best.); format date yymmdd10. time time.; datalines; INFO | jvm 1 | 2014/05/23 05:19:43 | ==> ready to accept requests (startup took 204 seconds). INFO | jvm 1 | 2014/05/27 18:14:20 | ==> ready to accept requests (startup took 637 seconds). run;
Xia Keshan
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.