DATA Step, Macro, Functions and more

Read the file names and create a dataset

Reply
Regular Contributor
Posts: 168

Read the file names and create a dataset

My requirement is to read the file names from dataset and create a master dataset by appending all the files. I 've tried this via follow code, but could not succeed.

 

I would request someone to guide me to complete this task.

 

Reading the filenames as below.

data dirlist;
  infile cards truncover;
  input file_names $300.;
  cards;
'/usr/sas/sas_config/Lev1/SASApp/StoredProcessServ​er/Logs/SASApp_STPServer_2015-08-30_tmptcmlva2_191​42.log'
'/usr/sas/sas_config/Lev1/SASApp/StoredProcessServ​er/Logs/SASApp_STPServer_2015-08-29_tmptcmlva2_191​42.log'
'/usr/sas/sas_config/Lev1/SASApp/StoredProcessServ​er/Logs/SASApp_STPServer_2015-08-28_tmptcmlva2_191​42.log'
;
run;

 

Now I need to read and append all the files to create one master dataset. So I wrote a code as below. However, I could  not succeeed.

 

It seems there was some error in my either infile statement or do loop which I'm unable to figure out.

 

DATA output_data_set;
set dirlist;
/* read the file references in variable called file_names */
INFILE  IN dsd truncover FILEVAR = file_names END = end_of_file LRECL=32000;
     DO WHILE (end_of_file = 0);
        input var : $ 3000.;
/*filename=file_names;*/
var1 = _infile_;
if var1 = :'201';
Date_TimeStamp= scan(var1,1," ");
Status = scan(var1,2," ");
Processid = scan(var1,3," ");
userid = scan(var1,4," ");
Details = scan(var1,-1,'-');
drop var var1;
        OUTPUT;
     END;
RUN;

 

 There are records in my input files,but my code could not read it for some reasons. Please guide me.

 

Log:

 

4          data dirlist;
5            infile cards truncover;
6            input file_names $300.;
7            cards;

NOTE: The data set WORK.DIRLIST has 3 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.00 seconds
      

7        !         
11         ;
12         run;
2                                                          The SAS System                           08:36 Friday, September 11, 2015

13         
14         DATA output_data_set;
15         set dirlist;
16         /* read the file references in variable called file_names */
17         INFILE  IN dsd truncover FILEVAR = file_names END = end_of_file LRECL=32000;
18              DO WHILE (end_of_file = 0);
19                 input var : $ 3000.;
20         /*filename=file_names;*/
21         var1 = _infile_;
22         if var1 = :'201';
23         Date_TimeStamp= scan(var1,1," ");
24         Status = scan(var1,2," ");
25         Processid = scan(var1,3," ");
26         userid = scan(var1,4," ");
27         Details = scan(var1,-1,'-');
28         drop var var1;
29                 OUTPUT;
30              END;
31         RUN;

NOTE: The variable file_names exists on an input data set, but was also specified in an I/O statement option.  The variable will 
      not be included on any output data set.
NOTE: The infile IN is:
      Filename=/usr/sas/sas_config/Lev1/SASApp/StoredPro​cessServer/Logs/SASApp_STPServer_2015-08-30_tmptcm​lva2_19142.log,
      
      File List=('/usr/sas/sas_config/Lev1/SASApp/StoredProce​ssServer/Logs/SASApp_STPServer_2015-08-30_tmptcmlv​a2_19142.log' 
      '/apps/tir/test/loganalysis/
'),
      Owner Name=sassrv,Group Name=sas,
      Access Permission=rwxrwxr-x,
      Last Modified=Mon Aug 31 00:01:56 2015,
      File Size (bytes)=720559

NOTE: The infile IN is:
      Filename=/usr/sas/sas_config/Lev1/SASApp/StoredPro​cessServer/Logs/SASApp_STPServer_2015-08-29_tmptcm​lva2_19142.log,
      
      File List=('/usr/sas/sas_config/Lev1/SASApp/StoredProce​ssServer/Logs/SASApp_STPServer_2015-08-29_tmptcmlv​a2_19142.log' 
      '/apps/tir/test/loganalysis/
'),
      Owner Name=sassrv,Group Name=sas,
      Access Permission=rwxrwxr-x,
      Last Modified=Sun Aug 30 00:01:46 2015,
      File Size (bytes)=720559

NOTE: The infile IN is:
      Filename=/usr/sas/sas_config/Lev1/SASApp/StoredPro​cessServer/Logs/SASApp_STPServer_2015-08-28_tmptcm​lva2_19142.log,
      
      File List=('/usr/sas/sas_config/Lev1/SASApp/StoredProce​ssServer/Logs/SASApp_STPServer_2015-08-28_tmptcmlv​a2_19142.log' 
      '/apps/tir/test/loganalysis/
'),
      Owner Name=sassrv,Group Name=sas,
      Access Permission=rwxrwxr-x,
      Last Modified=Sat Aug 29 00:01:36 2015,
      File Size (bytes)=2815408

NOTE: 1 record was read from the infile IN.
      The minimum record length was 572.
      The maximum record length was 572.
NOTE: 1 record was read from the infile IN.
      The minimum record length was 572.
      The maximum record length was 572.
3                                                          The SAS System                           08:36 Friday, September 11, 2015

NOTE: 1 record was read from the infile IN.
      The minimum record length was 572.
      The maximum record length was 572.
NOTE: There were 3 observations read from the data set WORK.DIRLIST.
NOTE: The data set WORK.OUTPUT_DATA_SET has 0 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.00 seconds
Super User
Super User
Posts: 7,401

Re: Read the file names and create a dataset

Could you clarify your request.  Do you want to create a dataset with the contents of the three files, if so why not use a wildcard approach:

data want;

  infile "/usr/sas/sas_config/Lev1/SASApp/StoredProcessServ​er/Logs/*.log";

...

run;

 

If you have to be specific then maybe:

data _null_;

  set dirlist;

  call execute('data inter; infile "'||strip(file_names)||'"; ....otherbits of code...; run;');

  if _n_=1 then call execute('data base; set inter; run;');

  else call execute('proc datasets library=work nolist; base=your_base_file data=inter force; run;');

run;

Super User
Super User
Posts: 7,401

Re: Read the file names and create a dataset

Hi,

 

Having some difficulties using this new forum.  For your question, is there a reason why you do not use the wildcard:

data xyz;

  infile "/usr/sas/sas_config/Lev1/SASApp/StoredProcessServ​er/Logs/*.log";

...

run;

 

Or maybe if you need to be specific:

data _null_;

  set dir_list;

  call execute('data inter; infile "'||strip(file_names)||'"; ...other code...; run;');

  if _n_=1 then call execute('data base; set inter; run;');

  else call execute('proc datasets library=work nolist; base=base data=inter force; run;');

run;

Super User
Posts: 17,836

Re: Read the file names and create a dataset

Does your code work for one file? What is your program for a single file that works?

Respected Advisor
Posts: 3,777

Re: Read the file names and create a dataset

Change

DO WHILE (end_of_file = 0);
to
DO WHILE(NOT end_of_file);
Super User
Posts: 6,939

Re: Read the file names and create a dataset

This might be the culprit causing no output:

if var1 = :'201';
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 3,777

Re: Read the file names and create a dataset

[ Edited ]

Yes that is it.  A subsetting IF is breaking out of the DO WHILE(NOT EOF) loop and sending the program back to read another file and on and on.

Super Contributor
Posts: 426

Re: Read the file names and create a dataset

After applying your advise I slightly tweaked my code with macro, but again I end up with other error.

 

In this case, I've the file names in output_file.txt and I reading it to create a dataset. Please be informed I need to create a dataset only if file name (part of file name) matches. My management want me to develop with this code via macro and hence I used macro in my code.

 

output_file.txt has

 

fname   /*header*/

SASApp_STPServer_2015-09-15_tmptcmsaslva2_18208.log
SASApp_STPServer_2015-09-15_tmptcmsaslva2_19142.log

 

libname log '/usr/sas/tir/test/loganalysis/';

proc import datafile='/usr/sas/tir/test/loganalysis/output_file.txt'
            out=log.output_file
            dbms=dlm
                        replace;
	    delimiter='09'x;
run;

Options symbolgen mlogic mprint mfile; 
%macro log_analysis;

%let filedate=%sysfunc(putn("&sysdate9"d-2,yymmdd10.));
%put &filedate;

data log.log_analysis;
length fname filename $200;
infile "/usr/sas/sas_config/Lev1/SASApp/StoredProcessServer/Logs/SASApp_STPServer_&filedate._tmptcmsaslva2_*.log" dsd truncover FILEVAR = file_names END = end_of_file LRECL=32000;
DO WHILE (not end_of_file);
input var : $ 3000.;
/*filename=fname;*/
var1 = _infile_;
if var1 = :'201';
Date_TimeStamp= scan(var1,1," ");
Status = scan(var1,2," ");
Processid = scan(var1,3," ");
userid = scan(var1,4," ");
Details = scan(var1,-1,'-');
drop var var1;
output;
end;
run;

mend log_analysis;


data _null_;
  set log.output_file;
  if fname =: 'SASApp_STPServer' then call execute ('%log_analysis;');
else put 'no log files';
run;

 

Piece of log

 

1         + data log.log_analysis;  length fname filename $200;  infile 
"/usr/sas/sas_config/Lev1/SASApp/StoredProcessServer/Logs/SASApp_STPServer_2015-09-15_tmptcmsaslva2_*.log" dsd truncover FILEVAR = 
file_names END = end_of_file LRECL=32000;  DO WHILE (not
ERROR: A Physical file reference (i.e. "PHYSICAL FILE REFERENCE" ) or an aggregate file storage reference (i.e. AGGREGATE(MEMBER) ) 
       reference cannot be used with the FILEVAR= option.

 

 

I request someone to help me to come past this error.

Super User
Posts: 6,939

Re: Read the file names and create a dataset

In your data step within the macro, you specify

filevar = file_names

This means that file_names must contain the physical name of the infile to be read. Since you do not assign a value to file_names anywhere in the data step, this fails.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 6,500

Re: Read the file names and create a dataset

The error message is pretty clear. You cannot code INFILE 'myfile' FILEVAR=myfilenamevar ... since you are giving it the physical name in two places. Replace the quoted filename with a dummy fileref. INFILE dummy FILEVAR=myfilenamevar ... ;
Super Contributor
Posts: 426

Re: Read the file names and create a dataset

 

 

I replaced my infile statement as follows.

 

infile dummy dsd truncover FILEVAR = file_names END = end_of_file LRECL=32000;

 

 

Got error as follows.

 

NOTE: Variable fname is uninitialized.
NOTE: Variable filename is uninitialized.
ERROR: Invalid physical name.

 

My question,  how SAS will recongnize that the file reference 'dummy' is the file names to be read from the dataset log.output_file. Please be informed that there is only the filename in the dataset log.output_file. There is no path in the dataset or anywhere in the program.

 

Kindly guide me where should I mention the path in the program which makes life easier. Just I need to know whether the  file output_fil​e.txt should have the path along with file names or I should mention in infile statement which is under the macro.

 

I would request you to see my previous post to understand the files and other stuffs.

 

Thanks  for the inputs so far.

Super User
Posts: 6,939

Re: Read the file names and create a dataset

When you use the filevar=any_variable_name construct in a INFILE statement, you MUST provide a valid physical filename in the variable any_variable_name before you can read from the infile.

 

If you want to read data from a certain set of similarly structured external files in one data step, you best create a SAS dataset with the filenames and do something like that

 

data have;
* code that reads filenames into the variable file_names;
run;

data want;
infile dummy filevar=file_names end=end_of_file /* other options */;
set have;
do while (not end_of_file);
  input /* list of variables */;
  * further processing;
  output;
end;
run;

 

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 426

Re: Read the file names and create a dataset

Still I had trouble reading the files. I did tried as you said, but I got error like 'ERROR: Invalid physical name'

 

Output_file.txt has records like, (fname is a header)

 

fname  
/usr/sas/sas_config/Lev1/SASApp/StoredProcessServer/Logs/SASApp_STPServer_2015-09-15_tmptcmsaslva2_18208.log
/usr/sas/sas_config/Lev1/SASApp/StoredProcessServer/Logs/SASApp_STPServer_2015-09-15_tmptcmsaslva2_19142.log

'

 

My code is,

 

libname log '/usr/sas/tir/test/loganalysis/';

proc import datafile='/usr/sas/tir/test/loganalysis/output_file.txt'
            out=log.output_file
            dbms=dlm
                        replace;
	    delimiter='09'x;
run;


Options symbolgen mlogic mprint mfile; 

%macro log_analysis;

%let filedate=%sysfunc(putn("&sysdate9"d-3,yymmdd10.));
%put &filedate;

data log.log_analysis;
length fname $200;
infile dummy dsd truncover FILEVAR = fname END = end_of_file LRECL=32000;
DO WHILE (not end_of_file);
input var : $ 3000.;
/*filename=fname;*/
var1 = _infile_;
if var1 = :'201';
Date_TimeStamp= scan(var1,1," ");
Status = scan(var1,2," ");
Processid = scan(var1,3," ");
userid = scan(var1,4," ");
Details = scan(var1,-1,'-');
drop var var1;
output;
end;
run;

%mend log_analysis;

data _null_;
  set log.output_file;
  if fname =: '/usr/sas/sas_config/Lev1/SASApp/StoredProcessServer/Logs/SASApp_STPServer' then call execute ('%log_analysis;');
else put 'no log files';
run;

 

Super User
Posts: 6,939

Re: Read the file names and create a dataset

Babloo, take a good look at this part of you code:

 

data log.log_analysis;
length fname $200;
infile dummy dsd truncover FILEVAR = fname END = end_of_file LRECL=32000;
DO WHILE (not end_of_file);
input var : $ 3000.;
/*filename=fname;*/
var1 = _infile_;
if var1 = :'201';
Date_TimeStamp= scan(var1,1," ");
Status = scan(var1,2," ");
Processid = scan(var1,3," ");
userid = scan(var1,4," ");
Details = scan(var1,-1,'-');
drop var var1;
output;
end;
run;

and then tell me: WHERE DO YOU SET fname???

Answer: nowhere. That's why it does not work.

The fact that fname is present in the data step that creates code through call execute does in no way imply that the variable somehow magically appears in the generated code (which is executed AFTER the data step doing the call execute has ended). You have to provide for some mechanism to get the filenames into your main step.

 

And I feel you are thinking way too complicated here.

 

My immediate solution would look like this:

 

- create a data step to read one single log file into an intermediate file

- wrap that in a macro definition that accepts the input filename as a parameter, and has a second step appending the data to a base file

- call that macro in a data step that reads the input filenames; every iteration will add another log output to your base file-

 

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 6,500

Re: Read the file names and create a dataset

[ Edited ]

You are making it much too complicated. As someone said above the main error is that you cannot use a subsetting IF statement because that will break out of the DO WHILE loop and terminate the processing of the current file.  You need to use IF/THEN/DO/END instead to conditionally execute the OUTPUT statement.

 

You probably should add some LENGTH or ATTRIB statements to define the variables that you are scanning from the lines in the log file.  As the code is currently written they will all be character length 3000 since they are derived from string operations on VAR1 which is defaulting to the same because of the format used on the INPUT statement.

 

%let path=/usr/sas/tir/test/loganalysis ;

libname log "&path/";

data log.output_file ;
  infile "&path/output_file.txt" firstobs=2 truncover ;
  input fname $200. ;
run;

data log.log_analysis;
  set log.output_file ;
  filename=fname;
  infile dummy dsd truncover filevar = fname end = end_of_file lrecl=32000;
  do while (not end_of_file);
    input var1 $3000.;
    if var1 = :'201' then do;
      Date_TimeStamp= scan(var1,1," ");
      Status = scan(var1,2," ");
      Processid = scan(var1,3," ");
      userid = scan(var1,4," ");
      Details = scan(var1,-1,'-');
      output;
    end;
  end;
  drop var1;
run;

 

Ask a Question
Discussion stats
  • 26 replies
  • 419 views
  • 7 likes
  • 7 in conversation