BookmarkSubscribeRSS Feed
gyambqt
Obsidian | Level 7

Hello Experts

  • Dataset "a" created below is extracted from a log file
  • In the data "b", I want to pull out the information about tables after t_table_list, using regular expression. How do I write the regular expression in data step "b" when the tables' name are displayed in multiple lines?
  • There are thousands of log files, each can has more or less tables than the example provided below. I want to write a universal regular expression that allows me to extract  tables for all the log files. for example, in log file 1, there is only 1 table after t_table_list . But in log file2, there could be 100 tables. So I need the regular expression as flexible as possible.

data a;
length line $500;
infile cards dlm="*" truncover;
input line $150.;
cards;
87         %check (t_table_list=CUTP_TPCR_PIMS
88                 CUTP_CTP_CUTG
89                 CUTP_TPCR_PIM
90                 CUTP_TPCR_PIMQ
91                 CUTP_CTP_CUT
92                 CUTP_CTP_CUT_C
93                 CUTP_CTP_CUTN
94                 CUTP_CTP_CUAD
95                 CUTP_CTP_CUTC
96                 CUTP_TPCR_PIML,
97                l_file_list=,
98                s_file_list=
99         )
run;

    data b;
      set a;
       array rearr {*} regex1;
       array posarr {*} pos1;
       array colarr {*} check;
       keep line check ;
       dimm=1;
        if _n_=1 then do;
  rearr{1}=prxparse("/.*check\s*.*t_table_list=(.*)/i");
  
      end;
       retain regex1; 
       do i=1 to dimm;
         posarr{i}=prxmatch(rearr{i},line);
           if prxmatch(rearr{i}, line) then do;
           wf=prxparen(rearr{i});
             call prxposn(rearr{i},wf,posarr{i},len);
             colarr{i}=substr(line,posarr{i},len);
           end;
            output;
   end;
run;

Thanks

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Totally agree.  Parameters in a dataset then generate the macro call with those parameters rather than trying to acertain after the event.  Seems a bit of a dubious macro setup in the first place.  Just pass a dataset of tables in rather than an expanding text string.  Makes the call simpler and neater:

%check (table_list=work.table_list...);

gyambqt
Obsidian | Level 7

hi kurt, the table names is unknown And there could be  more than 1000 log files out there each contain same or different table. I have to use regular expression because other information need to be extracted from the log files. The data step b is only part of the program for demonstration.

actually I want to know how to use regulat expression to select information in multiple lines.

i have tried \n to read new line. But it doesn't work.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

So your not actually running the programs yourself then?  The reason being is that you will find you life is easier if you setup the run side of things to capture any information you need, if special information is required, then put it out to the log in a set way e.g.

[FOR REPORT] Paramters passed to XYZ macro are: X Y Z

Then its simplified to get this information.  Also at run time you can change your running system, instead of putting things like paratemers into a program, collect the information as metadata in datasets, then generate the code from that.

Otherwise you could be searching for quite a bit of complex text to find what you are looking for, e.g. macro resolved text could have all kinds of spacing, line overlaps and other things, quotes will become difficult etc.

Most companies I have seen have a log checker, but these are built to pick up on the specific items thrown in the log, e.g. ERROR:, WARNING:, NOTE:, and thus are generally quite simple.  Personally I would follow that approach.

gyambqt
Obsidian | Level 7

HI RW9,

I have no control to the macro check in the originally sas program.

but i need to pull out information about tables has been included in the log file after the original sas program run.

what I need to achieve is to modify the regular expression I have used in data step 2 which has been highlighted with dark color above to capture information from dataset a.

Peter_L
Quartz | Level 8

As a general approach, I would read the data one line at a time. Write a regex to recognise the line containing %check and another to recognise the line with the closing brace. Set a state variable when you see the %check to indicate that you are in the list of table names. Use additional regexes and tests of the state variable within the list to distinguish the table names (all caps words with _) from the other parts containing = characters. When your regexes recognise a table name, use the output statement to write a record to your output dataset.

The pseudo-code (not SAS) structure looks like this:

retain inlist

if (regex-function-start) then do

  inlist = 1

  tablename = substring() /* get the name from after the ( */

  output tablename

end;

if (inlist and regex-table-name) then do

  tablename = substring() /* get the name, e.g. without leading spaces */

  output tablename

end

if (regex-function-end) then inlist = 0

As I mentioned in an other post, the awk program is easier to use than SAS for this type of problem. This case is fairly simple, so SAS can do it.

Ksharp
Super User

You can use different Perl Regular Express to pull these out .

Tom
Super User Tom
Super User

Can you explain the problem again?  Without the assumed solution?

I see that you have these questions:

It sounds like you want to scan SAS logs. 

You could write a utility to do this using grep/awk etc or just use a SAS Data step.

It sounds like you want to get something them those logs, perhaps the notes that SAS writes when it generates a data set? 

You might do this with regular expression, but the log messages are regular enough that a simple IF statement is normally all that is needed.

It sounds like you want to scan multiple SAS logs.

My suggestion is get it working for one log file and then just loop over the list of files to scan.

It might be possible to do it all in one step using wildcards in the INFILE statement so that the data step that is scanning reads multiple files. You can use the FILENAME option to get the name of the log file being read.

Or you could use a data set of file names to scan and use the FILEVAR option on the INFILE statement to tell it which file to read.

Ksharp
Super User

data a;

length line $500;

infile cards dlm="*" truncover;

input line $150.;

cards;

87         %check (t_table_list=CUTP_TPCR_PIMS

88                 CUTP_CTP_CUTG

89                 CUTP_TPCR_PIM

90                 CUTP_TPCR_PIMQ

91                 CUTP_CTP_CUT

92                 CUTP_CTP_CUT_C

93                 CUTP_CTP_CUTN

94                 CUTP_CTP_CUAD

95                 CUTP_CTP_CUTC

96                 CUTP_TPCR_PIML,

97                l_file_list=,

98                s_file_list=

99         )

;

run;

data want;

set a;

length tables $ 40;

if prxmatch('/^\d+\s+%check/',strip(line)) then tables=scan(line,-1,'=');

  else if prxmatch('/^\d+\s+\w+\,?$/',strip(line)) then tables=scan(line,-1);

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1096 views
  • 0 likes
  • 6 in conversation