DATA Step, Macro, Functions and more

SAS txt file extraction problem

Reply
Regular Contributor
Posts: 152

SAS txt file extraction problem

Hello Experts

  • Dataset "a" created below is extracted from a log file
  • In the data "b", I want to pull out the information about tables after t_table_list, using regular expression. How do I write the regular expression in data step "b" when the tables' name are displayed in multiple lines?
  • There are thousands of log files, each can has more or less tables than the example provided below. I want to write a universal regular expression that allows me to extract  tables for all the log files. for example, in log file 1, there is only 1 table after t_table_list . But in log file2, there could be 100 tables. So I need the regular expression as flexible as possible.

data a;
length line $500;
infile cards dlm="*" truncover;
input line $150.;
cards;
87         %check (t_table_list=CUTP_TPCR_PIMS
88                 CUTP_CTP_CUTG
89                 CUTP_TPCR_PIM
90                 CUTP_TPCR_PIMQ
91                 CUTP_CTP_CUT
92                 CUTP_CTP_CUT_C
93                 CUTP_CTP_CUTN
94                 CUTP_CTP_CUAD
95                 CUTP_CTP_CUTC
96                 CUTP_TPCR_PIML,
97                l_file_list=,
98                s_file_list=
99         )
run;

    data b;
      set a;
       array rearr {*} regex1;
       array posarr {*} pos1;
       array colarr {*} check;
       keep line check ;
       dimm=1;
        if _n_=1 then do;
  rearr{1}=prxparse("/.*check\s*.*t_table_list=(.*)/i");
  
      end;
       retain regex1; 
       do i=1 to dimm;
         posarr{i}=prxmatch(rearr{i},line);
           if prxmatch(rearr{i}, line) then do;
           wf=prxparen(rearr{i});
             call prxposn(rearr{i},wf,posarr{i},len);
             colarr{i}=substr(line,posarr{i},len);
           end;
            output;
   end;
run;

Thanks

Super User
Posts: 6,963

Re: SAS txt file extraction problem

Why so complicated? Create a dataset with your table names once and then use that to generate code whenever needed.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,413

Re: SAS txt file extraction problem

Totally agree.  Parameters in a dataset then generate the macro call with those parameters rather than trying to acertain after the event.  Seems a bit of a dubious macro setup in the first place.  Just pass a dataset of tables in rather than an expanding text string.  Makes the call simpler and neater:

%check (table_list=work.table_list...);

Regular Contributor
Posts: 152

Re: SAS txt file extraction problem

hi kurt, the table names is unknown And there could be  more than 1000 log files out there each contain same or different table. I have to use regular expression because other information need to be extracted from the log files. The data step b is only part of the program for demonstration.

actually I want to know how to use regulat expression to select information in multiple lines.

i have tried \n to read new line. But it doesn't work.

Super User
Super User
Posts: 7,413

Re: SAS txt file extraction problem

So your not actually running the programs yourself then?  The reason being is that you will find you life is easier if you setup the run side of things to capture any information you need, if special information is required, then put it out to the log in a set way e.g.

[FOR REPORT] Paramters passed to XYZ macro are: X Y Z

Then its simplified to get this information.  Also at run time you can change your running system, instead of putting things like paratemers into a program, collect the information as metadata in datasets, then generate the code from that.

Otherwise you could be searching for quite a bit of complex text to find what you are looking for, e.g. macro resolved text could have all kinds of spacing, line overlaps and other things, quotes will become difficult etc.

Most companies I have seen have a log checker, but these are built to pick up on the specific items thrown in the log, e.g. ERROR:, WARNING:, NOTE:, and thus are generally quite simple.  Personally I would follow that approach.

Regular Contributor
Posts: 152

Re: SAS txt file extraction problem

HI RW9,

I have no control to the macro check in the originally sas program.

but i need to pull out information about tables has been included in the log file after the original sas program run.

what I need to achieve is to modify the regular expression I have used in data step 2 which has been highlighted with dark color above to capture information from dataset a.

Contributor
Posts: 32

Re: SAS txt file extraction problem

As a general approach, I would read the data one line at a time. Write a regex to recognise the line containing %check and another to recognise the line with the closing brace. Set a state variable when you see the %check to indicate that you are in the list of table names. Use additional regexes and tests of the state variable within the list to distinguish the table names (all caps words with _) from the other parts containing = characters. When your regexes recognise a table name, use the output statement to write a record to your output dataset.

The pseudo-code (not SAS) structure looks like this:

retain inlist

if (regex-function-start) then do

  inlist = 1

  tablename = substring() /* get the name from after the ( */

  output tablename

end;

if (inlist and regex-table-name) then do

  tablename = substring() /* get the name, e.g. without leading spaces */

  output tablename

end

if (regex-function-end) then inlist = 0

As I mentioned in an other post, the awk program is easier to use than SAS for this type of problem. This case is fairly simple, so SAS can do it.

Super User
Posts: 9,687

Re: SAS txt file extraction problem

You can use different Perl Regular Express to pull these out .

Super User
Super User
Posts: 6,502

Re: SAS txt file extraction problem

Can you explain the problem again?  Without the assumed solution?

I see that you have these questions:

It sounds like you want to scan SAS logs. 

You could write a utility to do this using grep/awk etc or just use a SAS Data step.

It sounds like you want to get something them those logs, perhaps the notes that SAS writes when it generates a data set? 

You might do this with regular expression, but the log messages are regular enough that a simple IF statement is normally all that is needed.

It sounds like you want to scan multiple SAS logs.

My suggestion is get it working for one log file and then just loop over the list of files to scan.

It might be possible to do it all in one step using wildcards in the INFILE statement so that the data step that is scanning reads multiple files. You can use the FILENAME option to get the name of the log file being read.

Or you could use a data set of file names to scan and use the FILEVAR option on the INFILE statement to tell it which file to read.

Super User
Posts: 9,687

Re: SAS txt file extraction problem

data a;

length line $500;

infile cards dlm="*" truncover;

input line $150.;

cards;

87         %check (t_table_list=CUTP_TPCR_PIMS

88                 CUTP_CTP_CUTG

89                 CUTP_TPCR_PIM

90                 CUTP_TPCR_PIMQ

91                 CUTP_CTP_CUT

92                 CUTP_CTP_CUT_C

93                 CUTP_CTP_CUTN

94                 CUTP_CTP_CUAD

95                 CUTP_CTP_CUTC

96                 CUTP_TPCR_PIML,

97                l_file_list=,

98                s_file_list=

99         )

;

run;

data want;

set a;

length tables $ 40;

if prxmatch('/^\d+\s+%check/',strip(line)) then tables=scan(line,-1,'=');

  else if prxmatch('/^\d+\s+\w+\,?$/',strip(line)) then tables=scan(line,-1);

run;

Ask a Question
Discussion stats
  • 9 replies
  • 370 views
  • 0 likes
  • 6 in conversation