BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
lags
Calcite | Level 5

Moved from https://communities.sas.com/t5/SAS-Programming/import-multiple-files-in-multiple-folders-with-the-sa...

 

 

Thanks a lot for this @Reeza 

Hello Reeza,

I'm trying to use the code above and have adapted as below but cannot figure out why is not working. All files read have zero records and I thought this was related to name of the variables, may be due to variable 'filename' is in both data steps?

I get "ERROR: Invalid physical name."

I greatly appreciate your help to fix this.

 

Filename filelist pipe "dir /b /s C:\Documents\*.dat";
Data filelist;
Infile filelist truncover;
Input filename $150.;
Put filename=;
Run;

Proc Print data=filelist (obs=2);

run;

filename
C:\Documents\2020\01\20\20200120-064931-DataLog_User.dat
C:\Documents\2020\01\22\20200122-031029-DataLog_User.dat


data import_all;
SET filelist;
length filename txt_file_name $256;
retain txt_file_name;

infile my_file eov=eov filevar=myfile filename=filename truncover;

input@;

file my_file 'myfile.txt';

if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;

delete;
end;

else input X Y Z;
run;

1 ACCEPTED SOLUTION
13 REPLIES 13
Kurt_Bremser
Super User

In your dataset filelist, you create a variable called filename, but in your INFILE statement, you use a variable myfile (which does not exist and should create a NOTE in the log that it is uninitialized) in your FILEVAR= option.

Instead, you use filename in the FILENAME= option, but this option is used to have a variable where the current filename will be stored by the INFILE statement (useful when you read multiple file with a wildcard).

 

Try this instead:

data import_all;
set filelist;
length txt_file_name $256;
retain txt_file_name;
infile my_file eov=eov filevar=filename truncover;
input@;
if _n_ eq 1 or eov then do;
  txt_file_name = scan(filename, -1, "\");
  eov=0;
  delete;
end;
else input X Y Z;
run;

 

 

lags
Calcite | Level 5

Thanks @Kurt_Bremser 

The log below, still having that error as in the log below

 

29 data import_all;
30 set filelist;
31 length txt_file_name $256;
32 retain txt_file_name;
33 infile my_file eov=eov filevar=filename truncover;
34 input@;
35 if _n_ eq 1 or eov then do;
36 txt_file_name = scan(filename, -1, "\");
37 eov=0;
38 delete;
39 end;
40 else input X Y Z;
42 run;

NOTE: The variable filename exists on an input data set, but was also specified in an I/O
statement option. The variable will not be included on any output data set.
ERROR: Invalid physical name.
filename= txt_file_name= eov=0 X=. Y=. Z=.  _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.FILELIST.
WARNING: The data set WORK.IMPORT_ALL may be incomplete. When this step was stopped there were 0
observations and 39 variables.
WARNING: Data set WORK.IMPORT_ALL was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

 

Tom
Super User Tom
Super User

So the value of FILENAME is empty, no wonder SAS could not find the file.  Look carefully at the dataset you are using to drive this step to make sure it actually has names of files.

 

You could perhaps trap some of the errors by using the FILEEXISTS() function.

 

The rest of that step is also not really going to work either.  The way that step is coded it will only try to read one line from each file in your dataset.

lags
Calcite | Level 5

Thanks @Kurt_Bremser  @Tom @Reeza 

I have large datasets so I have to cut and paste bits and pieces instead of copying and pasting everything.

The code below is from Kurt I believe and the dataset is empty, only has the rows belonging to the filename (nro rows = nro files). Hoever, all variables except filename and are missing.

 

12   * GET THE FILENAMES FOR ALL FILES IN THE FOLDER NAMED BELOW WITH AN EXTENSION .DAT;
13   Filename filelist pipe "dir /b /s C:\2020\*.dat";
14   Data filelist;
15   *USE THE LINE BELOW IF WANT ONLY THE LIST OF FILES IN THE LOG
16   Data _null_;
17   Infile filelist truncover;
18   Input filename $256.;
19   Put filename=;
20   Run;

NOTE: The infile FILELIST is:
      Unnamed Pipe Access Device,
      PROCESS=dir /b /s C:\2020\*.dat,
      RECFM=V,LRECL=32767

filename=C:\2020\01\20\20200120-064931-DataLog_User.dat
filename=C:\2020\01\20\20200122-031029-DataLog_User.dat
....
....
filename=C:\2020\08\22\20201124-140120-DataLog_User.dat
filename=C:\2020\08\22\20201124-150129-DataLog_User.dat
NOTE: 4198 records were read from the infile FILELIST.
      The minimum record length was 97.
      The maximum record length was 97.
NOTE: The data set WORK.FILELIST has 4198 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           9.85 seconds
      cpu time            2.65 seconds


23   data import_all2;
24   set filelist;
25   length txt_file_name $256;
26   retain txt_file_name;
27   infile my_file eov=eov filevar=filename truncover;
28   input@;
29   if _n_ eq 1 or eov then do;
30     txt_file_name = scan(filename, -1, "\");
31     eov=0;
32     delete;
33   end;
34   else input X Y Z;
35   run;

NOTE: The variable filename exists on an input data set, but was also specified in an I/O statement option.  The variable will not be included on any output data set.
NOTE: The infile MY_FILE is:
      Filename=C:\2020\01\20\20200120-064931-Dat
      aLog_User.dat,
      RECFM=V,LRECL=32767,File Size (bytes)=1560240,
      Last Modified=20 January 2020 18:15:08,
      Create Time=26 July 2020 22:32:22

NOTE: The infile MY_FILE is:

      Filename=C:\2020\01\22\JFAADS2172-20200122-031029-Dat
      aLog_User.dat,
      RECFM=V,LRECL=32767,File Size (bytes)=175230,
      Last Modified=22 January 2020 14:13:20,
      Create Time=26 July 2020 22:32:22

NOTE: Invalid data for X in line 1 1-4.
NOTE: Invalid data for Y in line 1 27-30.
NOTE: Invalid data for Z in line 1 53-72.
NOTE: Invalid data errors for file MY_FILE occurred outside the printed range.
NOTE: Increase available buffer lines with the INFILE n= option.
filename=C:\2020\01\22\20200122-071450-DataLog_U
ser.dat txt_file_name=JFAADS2172-20200120-064931-DataLog_User.dat eov=0 X=. Y=. Z=.  
_ERROR_=1 _N_=7
NOTE: The infile MY_FILE is:
...
...
WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed.
NOTE: Invalid data errors for file MY_FILE occurred outside the printed range.
NOTE: Increase available buffer lines with the INFILE n= option.
filename=C:\2020\01\22\20200122-211701-DataLog_User.dat txt_file_name=20200120-064931-DataLog_User.dat eov=0 X=. Y=.
Z=. 
_ERROR_=1 _N_=21
NOTE: The infile MY_FILE is:
...
...
NOTE: The infile MY_FILE is:

      Filename=C:2020\11\24\20201124-150129-Dat
      aLog_User.dat,
      RECFM=V,LRECL=32767,File Size (bytes)=2852470,
      Last Modified=24 November 2020 15:36:56,
      Create Time=24 November 2020 15:36:56

NOTE: 1 record was read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.
NOTE: 1 record was read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.
NOTE: 1 record was read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.
NOTE: 1 record was read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.
NOTE: 1 record was read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.
...
...
NOTE: There were 4198 observations read from the data set WORK.FILELIST.
NOTE: The data set WORK.IMPORT_ALL2 has 4197 observations and 39 variables.
NOTE: DATA statement used (Total process time):
      real time           6:36.55
      cpu time            50.64 seconds




The code below is from Tom I believe and it results in the correct number of rows being read (nro rows = nro files * nro rows/file). However, all rows have missing data except for the variable filename.

 

3    Filename filelist pipe "dir /b /s C:\2020\05\*.dat";
4    Data filelist;
5    *USE THE LINE BELOW IF WANT ONLY THE LIST OF FILES IN THE LOG
6    Data _null_;
7    Infile filelist truncover;
8    Input filename $256.;
9    Put filename=;
10   Run;

NOTE: The infile FILELIST is:
      Unnamed Pipe Access Device,
      PROCESS=dir /b /s C:\2020\05\*.dat,
      RECFM=V,LRECL=32767

filename=C:\2020\05\01\20200501-105705-DataLog_U
ser.dat
filename=C:\2020\05\01\20200501-115714-DataLog_U
ser.dat
filename=C:\2020\05\01\20200501-125724-DataLog_U
ser.dat
...
...
filename=C:\2020\05\31\20200601-095358-DataLog_U
ser.dat
NOTE: 761 records were read from the infile FILELIST.
      The minimum record length was 97.
      The maximum record length was 97.
NOTE: The data set WORK.FILELIST has 761 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           1.84 seconds
      cpu time            0.62 seconds



17   data import_all;
18     set filelist;
19     length txt_file_name $256;
20     filevar=filename ;
21     txt_file_name = scan(filename, -1, "\");
22     infile my_file dsd firstobs=2 filevar=filevar truncover end=eof;
23     do while(not eof);
24       input X Y Z;
25       output;
26     end;
27   run;

NOTE: The infile MY_FILE is:

      Filename=C:\2020\05\30\JFAADS2172-20200530-104633-Dat
      aLog_User.dat,
      RECFM=V,LRECL=32767,File Size (bytes)=4882680,
      Last Modified=30 May 2020 11:46:44,
      Create Time=26 July 2020 22:32:17

NOTE: Invalid data for X in line 2 1-978.
NOTE: Invalid data for X in line 3 1-978.
NOTE: Invalid data for X in line 4 1-978.
NOTE: Invalid data for X in line 5 1-978.
NOTE: Invalid data for X in line 6 1-978.
...
...

NOTE: The infile MY_FILE is:

      Filename=C:\2020\05\31\JFAADS2172-20200601-075337-Dat
      aLog_User.dat,
      RECFM=V,LRECL=32767,File Size (bytes)=4868820,
      Last Modified=01 June 2020 08:53:48,
      Create Time=26 July 2020 22:32:31

NOTE: The infile MY_FILE is:

      Filename=C:\2020\05\31\JFAADS2172-20200601-085348-Dat
      aLog_User.dat,
      RECFM=V,LRECL=32767,File Size (bytes)=4896540,
      Last Modified=01 June 2020 09:53:58,
      Create Time=26 July 2020 22:32:31

NOTE: The infile MY_FILE is:

      Filename=C:\2020\05\31\JFAADS2172-20200601-095358-Dat
      aLog_User.dat,
      RECFM=V,LRECL=32767,File Size (bytes)=4890600,
      Last Modified=01 June 2020 10:54:08,
      Create Time=26 July 2020 22:32:31

NOTE: 4939 records were read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.
...
...
NOTE: 4917 records were read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.
NOTE: 4945 records were read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.
NOTE: 4939 records were read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.
NOTE: There were 761 observations read from the data set WORK.FILELIST.
NOTE: The data set WORK.IMPORT_ALL has 3661162 observations and 40 variables.
NOTE: DATA statement used (Total process time):
      real time           14:08.26
      cpu time            36.32 seconds
Tom
Super User Tom
Super User

As I said on the other threads that first method is the wrong way to process a series of files when you are starting with a list of filenames.  You need to have the DO WHILE (NOT EOF) loop around the INPUT statement.  Otherwise SAS will only read one line from the file.

 

What goes inside the DO loop is the code that reads the lines from those .DAT files.

What types of data is in the .DAT files? Are they delimited text files? If so what is the delimiter character?

They seem to have fixed length records and a varying number of records.

NOTE: 4917 records were read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.
NOTE: 4945 records were read from the infile MY_FILE.
      The minimum record length was 988.
      The maximum record length was 988.

So figure out how to read ONE of the files and then put that code into the your program that reads all of the files.

You might start by looking at what the lines in one of those files looks like.  So code like this will dump the first 5 lines from the file to the log so you can look at it.

data _null_ ;
  infile 'C:\2020\05\31\JFAADS2172-20200601-075337-DataLog_User.dat' obs=5;
  input;
  list;
run;
Peter_C
Rhodochrosite | Level 12
That is a great resource. Thank you Tom.
Having looked at the sample data to see how it could be read into a SAS data step (not using the word "import" to lower any hope for that proc)
.....
It could be done, but the resource hints that datalog structures might not be consistent between implementations.
I have used SAS to read the underlying info from sources like these datalog files. It always seemed like an experiment
Reeza
Super User

Your INPUT seems wrong. You have 8 columns in your data but you have only three that you're trying to read in. 

 

else input X Y Z;

Do you have code that reads in a single file correctly? If so, please post it. 

Tom
Super User Tom
Super User

So you have a list of files and you want to read each file.

You need to use the value of FILENAME in your list as the value to pass to the INFILE statement for the FILEVAR= option.  You were passing it an undefined variable named MYFILE.

 

Since you are reading each file independently you can just use FIRSTSOBS=2 to skip the header line. But you need to use DO loop to read all of the lines from the file.  Which means you need to have an explicit OUTPUT statement to write each observation read in the loop.  Use the END= infile option to set a variable that will let you know when you have reached the end of this particular file.

data import_all;
  set filelist;
  length txt_file_name $256;
  filevar=filename ;
  txt_file_name = scan(filename, -1, "\");
  infile my_file dsd firstobs=2 filevar=filevar truncover end=eof;
  do while(not eof);
    input X Y Z;
    output;
  end;
run;

 

lags
Calcite | Level 5

Thanks @Tom 

I think your code is working but need formatting the data in the input statement. The attached is a snip of the data saved as .txt although the original is .dat.

Perhaps this makes easier for you to understand what's going on.

Thanks a lot

Tom
Super User Tom
Super User

Not sure why that file is such a problem to read. It looks really simple.

39    %let path=c:\downloads;
40    %let fname=20200504-130842-DataLog_User.txt;
41
42    data test;
43      infile "&path/&fname" truncover;
44      input date :yymmdd. time :time. var1-var36 ;
45      format date yymmdd10. time time12.3;
46      if _n_=1 then put (_all_) (=/);
47    run;

NOTE: The infile "c:\downloads/20200504-130842-DataLog_User.txt" is:
      Filename=c:\downloads\20200504-130842-DataLog_User.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=7918,
      Last Modified=09Mar2022:15:21:02,
      Create Time=09Mar2022:15:21:02


date=2020-05-04
time=13:08:34.980
var1=124.58929375
var2=2990.14305
var3=125.58929375
var4=1588561715
var5=0
var6=963
var7=140.00274422
var8=45.001243665
var9=43.625
var10=44.961509705
var11=45.003929138
var12=47
var13=4
var14=52146.125832
var15=0
var16=0.3617498916
var17=0.3589238228
var18=0.3586009095
var19=0.3491758251
var20=0.3717189497
var21=0.3688441229
var22=0.3685270161
var23=0.3587589883
var24=398.48587115
var25=409.23071232
var26=1.9158060364
var27=1.9536861719
var28=1.5265752787
var29=7.2809554684
var30=0
var31=0.7449547299
var32=53.858484392
var33=1.8026994928
var34=89.466766157
var35=415.53408769
var36=7.4182678721
NOTE: 8 records were read from the infile "c:\downloads/20200504-130842-DataLog_User.txt".
      The minimum record length was 988.
      The maximum record length was 988.
NOTE: The data set WORK.TEST has 8 observations and 38 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1561 views
  • 3 likes
  • 5 in conversation