Solved: Re: Retrieve and clean all files in a directory one by one

daradanye · Posted 08-04-2022 02:44 PM

Hi,

I am running a SAS program on a cloud (which I believe is a unix/linux system). Basically, I want to import and clean every cvs file in a directory. All the files start with log. I find this support working in my local computer: http://support.sas.com/kb/41/880.html However, when I run it in the cloud, it does not work. My code in the cloud is as follows:

filename DIRLIST pipe 'dir "/scratch/dg/log/SAS/log*.csv" /b ';

data dirlist ;
   infile dirlist lrecl=200 truncover;
   input file_name $100.;
run;

data edgar.dirlist;set dirlist;run;

data _null_;
   set dirlist end=end;
   count+1;
   call symputx('read'||put(count,4.-l),cats('/scratch/dg/log/SAS/',file_name));
   call symputx('dset'||put(count,4.-l),scan(file_name,1,'.'));
   if end then call symputx('max',count);
run;

options mprint symbolgen;
%macro readin;
   %do i=1 %to &max;

 
data seclog;
infile "&&read&i" delimiter = ',' MISSOVER
DSD lrecl=32767 firstobs=2 ;
informat ip $15. ;
informat date yymmdd10. ;
informat time anydtdtm40. ;


format ip $15. ;
format date yymmdd10. ;
format time datetime. ;

input
ip $ date time ;
run;


   %end;
%mend readin;

%readin;
run;

The error message is as follow:

MPRINT(READIN):   infile "/scratch/dg/log/SAS/dir: cannot access '/scratch/dg/log/SAS/log*.csv': No such file 
or directory" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

When I opened the dirlist and I did find that there are two variables: The variable name is file_name. But the value of the variable is

dir: cannot access '/scratch/dg/log/SAS/log*.csv': No such file or directory

I would appreciate it very much if someone can help here.

Reeza · Posted 08-04-2022 03:20 PM

DIR is a Windows command, not Unix.

Do you have pipe access?

Filename filelist pipe "ls  /scratch/dg/log/SAS/log*.csv "; 
                                                                                   
   Data dirList;                                        
     Infile filelist truncover;
     Input filename $100.;
   Run;

However, if all the files have the same layout and you want a single file at the end this is much easier.

data sec_log;
 
*make sure variables to store file name are long enough;
length filename txt_file_name $256;
 informat ip $15. ;
informat date yymmdd10. ;
informat time anydtdtm40. ;


format ip $15. ;
format date yymmdd10. ;
format time datetime. ;


*keep file name from record to record;
retain txt_file_name;
 
*Use wildcard in input;
infile "'/scratch/dg/log/SAS/log*.csv' " eov=eov filename=filename truncover;
 
*Input first record and hold line;
input@;
 
*Check if this is the first record or the first record in a new file;
*If it is, replace the filename with the new file name and move to next line;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "/");
eov=0;
delete;
end;
 
*Otherwise  go to the import step and read the files;
else input ip $ date time ;

run;

View solution in original post

ballardw · Posted 08-04-2022 02:53 PM

If the "/scratch/" folder is not a root for the path you need to provide one.

The Path has to be as the computer running the DIR command sees it.

I would suggest instead a data _null_ and a bunch of Call Symputx in this step

data _null_;
   set dirlist end=end;
   count+1;
   call symputx('read'||put(count,4.-l),cats('/scratch/dg/log/SAS/',file_name));
   call symputx('dset'||put(count,4.-l),scan(file_name,1,'.'));
   if end then call symputx('max',count);
run;

to assign the value of the Put(count,4. -l) etc to actual variables. Then look the values of those variables. You might just find some odd values depending of which version of DIR is involve.

daradanye · Posted 08-04-2022 03:02 PM

The /scratch/ is the root for the path. I have several steps before to import data from a folder in scratch and it works.

Kurt_Bremser · Posted 08-04-2022 03:19 PM

Run this for a test:

data files;
length dref $8 name $200;
rc = filename(dref,"/scratch/dg/log/SAS");
did = dopen(dref);
if did
then do;
  do i = 1 to dnum(did);
    name = dread(did,i);
    output;
  end;
  rc = dclose(did);
end;
else putlog "Directory can't be opened";
rc = filename(dref);
keep name;
run;

See which names, if any, you get, or if you get the error message in the log.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Reeza · Posted 08-04-2022 03:20 PM

DIR is a Windows command, not Unix.

Do you have pipe access?

Filename filelist pipe "ls  /scratch/dg/log/SAS/log*.csv "; 
                                                                                   
   Data dirList;                                        
     Infile filelist truncover;
     Input filename $100.;
   Run;

However, if all the files have the same layout and you want a single file at the end this is much easier.

data sec_log;
 
*make sure variables to store file name are long enough;
length filename txt_file_name $256;
 informat ip $15. ;
informat date yymmdd10. ;
informat time anydtdtm40. ;


format ip $15. ;
format date yymmdd10. ;
format time datetime. ;


*keep file name from record to record;
retain txt_file_name;
 
*Use wildcard in input;
infile "'/scratch/dg/log/SAS/log*.csv' " eov=eov filename=filename truncover;
 
*Input first record and hold line;
input@;
 
*Check if this is the first record or the first record in a new file;
*If it is, replace the filename with the new file name and move to next line;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "/");
eov=0;
delete;
end;
 
*Otherwise  go to the import step and read the files;
else input ip $ date time ;

run;

Tom · Posted 08-04-2022 03:48 PM

Sounds like the directory you are trying to search is not on the machine where your SAS code is running.

Also DIR is not really a Unix command, although a lot of unix version have implemented something like it, but it does not support the Windows style /b option.

Example:

>dir test/*log /b
dir: cannot access /b: No such file or directory
test/aaabatch_test1.log                test/endsas.log              test/m6.log
test/sasver.log        test/where_in.log

Here is what you get if the directory does not exist (or you cannot read from it).

>dir /scratch/nosuchdir/*.log /b
dir: cannot access /scratch/nosuchdir/*.log: No such file or directory
dir: cannot access /b: No such file or directory

So take two immediate steps.

1) Remove the /b

2) Figure out whether the directory you are trying to read the files from is actually available on the machine where SAS is running. And if it is what its actual name is.

Retrieve and clean all files in a directory one by one

Re: Retrieve and clean all files in a directory one by one

Re: Retrieve and clean all files in a directory one by one

Re: Retrieve and clean all files in a directory one by one

Re: Retrieve and clean all files in a directory one by one

Re: Retrieve and clean all files in a directory one by one

Re: Retrieve and clean all files in a directory one by one

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away