Thanks a lot for this @Reeza
Hello Reeza,
I'm trying to use the code above and have adapted as below but cannot figure out why is not working. All files read have zero records and I thought this was related to name of the variables, may be due to variable 'filename' is in both data steps?
I get "ERROR: Invalid physical name."
I greatly appreciate your help to fix this.
Filename filelist pipe "dir /b /s C:\Documents\*.dat";
Data filelist;
Infile filelist truncover;
Input filename $150.;
Put filename=;
Run;
Proc Print data=filelist (obs=2);
run;
C:\Documents\2020\01\20\20200120-064931-DataLog_User.dat |
C:\Documents\2020\01\22\20200122-031029-DataLog_User.dat |
data import_all;
SET filelist;
length filename txt_file_name $256;
retain txt_file_name;
infile my_file eov=eov filevar=myfile filename=filename truncover;
input@;
file my_file 'myfile.txt';
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;
delete;
end;
else input X Y Z;
run;
If those are DATALOG binary files it might take some detective work to figure out how to read them.
https://comp.lang.labview.narkive.com/lMvq284E/what-is-the-format-of-datalog-files#post8
In your dataset filelist, you create a variable called filename, but in your INFILE statement, you use a variable myfile (which does not exist and should create a NOTE in the log that it is uninitialized) in your FILEVAR= option.
Instead, you use filename in the FILENAME= option, but this option is used to have a variable where the current filename will be stored by the INFILE statement (useful when you read multiple file with a wildcard).
Try this instead:
data import_all;
set filelist;
length txt_file_name $256;
retain txt_file_name;
infile my_file eov=eov filevar=filename truncover;
input@;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;
delete;
end;
else input X Y Z;
run;
Thanks @Kurt_Bremser
The log below, still having that error as in the log below
29 data import_all;
30 set filelist;
31 length txt_file_name $256;
32 retain txt_file_name;
33 infile my_file eov=eov filevar=filename truncover;
34 input@;
35 if _n_ eq 1 or eov then do;
36 txt_file_name = scan(filename, -1, "\");
37 eov=0;
38 delete;
39 end;
40 else input X Y Z;
42 run;
NOTE: The variable filename exists on an input data set, but was also specified in an I/O
statement option. The variable will not be included on any output data set.
ERROR: Invalid physical name.
filename= txt_file_name= eov=0 X=. Y=. Z=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.FILELIST.
WARNING: The data set WORK.IMPORT_ALL may be incomplete. When this step was stopped there were 0
observations and 39 variables.
WARNING: Data set WORK.IMPORT_ALL was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
So the value of FILENAME is empty, no wonder SAS could not find the file. Look carefully at the dataset you are using to drive this step to make sure it actually has names of files.
You could perhaps trap some of the errors by using the FILEEXISTS() function.
The rest of that step is also not really going to work either. The way that step is coded it will only try to read one line from each file in your dataset.
Run the step that creates filelist, and the step to read the files in direct succession and post the complete log using this button:
Thanks @Kurt_Bremser @Tom @Reeza
I have large datasets so I have to cut and paste bits and pieces instead of copying and pasting everything.
The code below is from Kurt I believe and the dataset is empty, only has the rows belonging to the filename (nro rows = nro files). Hoever, all variables except filename and are missing.
12 * GET THE FILENAMES FOR ALL FILES IN THE FOLDER NAMED BELOW WITH AN EXTENSION .DAT; 13 Filename filelist pipe "dir /b /s C:\2020\*.dat"; 14 Data filelist; 15 *USE THE LINE BELOW IF WANT ONLY THE LIST OF FILES IN THE LOG 16 Data _null_; 17 Infile filelist truncover; 18 Input filename $256.; 19 Put filename=; 20 Run; NOTE: The infile FILELIST is: Unnamed Pipe Access Device, PROCESS=dir /b /s C:\2020\*.dat, RECFM=V,LRECL=32767 filename=C:\2020\01\20\20200120-064931-DataLog_User.dat filename=C:\2020\01\20\20200122-031029-DataLog_User.dat .... .... filename=C:\2020\08\22\20201124-140120-DataLog_User.dat filename=C:\2020\08\22\20201124-150129-DataLog_User.dat NOTE: 4198 records were read from the infile FILELIST. The minimum record length was 97. The maximum record length was 97. NOTE: The data set WORK.FILELIST has 4198 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 9.85 seconds cpu time 2.65 seconds 23 data import_all2; 24 set filelist; 25 length txt_file_name $256; 26 retain txt_file_name; 27 infile my_file eov=eov filevar=filename truncover; 28 input@; 29 if _n_ eq 1 or eov then do; 30 txt_file_name = scan(filename, -1, "\"); 31 eov=0; 32 delete; 33 end; 34 else input X Y Z; 35 run; NOTE: The variable filename exists on an input data set, but was also specified in an I/O statement option. The variable will not be included on any output data set. NOTE: The infile MY_FILE is: Filename=C:\2020\01\20\20200120-064931-Dat aLog_User.dat, RECFM=V,LRECL=32767,File Size (bytes)=1560240, Last Modified=20 January 2020 18:15:08, Create Time=26 July 2020 22:32:22 NOTE: The infile MY_FILE is: Filename=C:\2020\01\22\JFAADS2172-20200122-031029-Dat aLog_User.dat, RECFM=V,LRECL=32767,File Size (bytes)=175230, Last Modified=22 January 2020 14:13:20, Create Time=26 July 2020 22:32:22 NOTE: Invalid data for X in line 1 1-4. NOTE: Invalid data for Y in line 1 27-30. NOTE: Invalid data for Z in line 1 53-72. NOTE: Invalid data errors for file MY_FILE occurred outside the printed range. NOTE: Increase available buffer lines with the INFILE n= option. filename=C:\2020\01\22\20200122-071450-DataLog_U ser.dat txt_file_name=JFAADS2172-20200120-064931-DataLog_User.dat eov=0 X=. Y=. Z=. _ERROR_=1 _N_=7 NOTE: The infile MY_FILE is: ... ... WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed. NOTE: Invalid data errors for file MY_FILE occurred outside the printed range. NOTE: Increase available buffer lines with the INFILE n= option. filename=C:\2020\01\22\20200122-211701-DataLog_User.dat txt_file_name=20200120-064931-DataLog_User.dat eov=0 X=. Y=. Z=. _ERROR_=1 _N_=21 NOTE: The infile MY_FILE is: ... ... NOTE: The infile MY_FILE is: Filename=C:2020\11\24\20201124-150129-Dat aLog_User.dat, RECFM=V,LRECL=32767,File Size (bytes)=2852470, Last Modified=24 November 2020 15:36:56, Create Time=24 November 2020 15:36:56 NOTE: 1 record was read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988. NOTE: 1 record was read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988. NOTE: 1 record was read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988. NOTE: 1 record was read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988. NOTE: 1 record was read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988. ... ... NOTE: There were 4198 observations read from the data set WORK.FILELIST. NOTE: The data set WORK.IMPORT_ALL2 has 4197 observations and 39 variables. NOTE: DATA statement used (Total process time): real time 6:36.55 cpu time 50.64 seconds
The code below is from Tom I believe and it results in the correct number of rows being read (nro rows = nro files * nro rows/file). However, all rows have missing data except for the variable filename.
3 Filename filelist pipe "dir /b /s C:\2020\05\*.dat"; 4 Data filelist; 5 *USE THE LINE BELOW IF WANT ONLY THE LIST OF FILES IN THE LOG 6 Data _null_; 7 Infile filelist truncover; 8 Input filename $256.; 9 Put filename=; 10 Run; NOTE: The infile FILELIST is: Unnamed Pipe Access Device, PROCESS=dir /b /s C:\2020\05\*.dat, RECFM=V,LRECL=32767 filename=C:\2020\05\01\20200501-105705-DataLog_U ser.dat filename=C:\2020\05\01\20200501-115714-DataLog_U ser.dat filename=C:\2020\05\01\20200501-125724-DataLog_U ser.dat ... ... filename=C:\2020\05\31\20200601-095358-DataLog_U ser.dat NOTE: 761 records were read from the infile FILELIST. The minimum record length was 97. The maximum record length was 97. NOTE: The data set WORK.FILELIST has 761 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 1.84 seconds cpu time 0.62 seconds 17 data import_all; 18 set filelist; 19 length txt_file_name $256; 20 filevar=filename ; 21 txt_file_name = scan(filename, -1, "\"); 22 infile my_file dsd firstobs=2 filevar=filevar truncover end=eof; 23 do while(not eof); 24 input X Y Z; 25 output; 26 end; 27 run; NOTE: The infile MY_FILE is: Filename=C:\2020\05\30\JFAADS2172-20200530-104633-Dat aLog_User.dat, RECFM=V,LRECL=32767,File Size (bytes)=4882680, Last Modified=30 May 2020 11:46:44, Create Time=26 July 2020 22:32:17 NOTE: Invalid data for X in line 2 1-978. NOTE: Invalid data for X in line 3 1-978. NOTE: Invalid data for X in line 4 1-978. NOTE: Invalid data for X in line 5 1-978. NOTE: Invalid data for X in line 6 1-978. ... ... NOTE: The infile MY_FILE is: Filename=C:\2020\05\31\JFAADS2172-20200601-075337-Dat aLog_User.dat, RECFM=V,LRECL=32767,File Size (bytes)=4868820, Last Modified=01 June 2020 08:53:48, Create Time=26 July 2020 22:32:31 NOTE: The infile MY_FILE is: Filename=C:\2020\05\31\JFAADS2172-20200601-085348-Dat aLog_User.dat, RECFM=V,LRECL=32767,File Size (bytes)=4896540, Last Modified=01 June 2020 09:53:58, Create Time=26 July 2020 22:32:31 NOTE: The infile MY_FILE is: Filename=C:\2020\05\31\JFAADS2172-20200601-095358-Dat aLog_User.dat, RECFM=V,LRECL=32767,File Size (bytes)=4890600, Last Modified=01 June 2020 10:54:08, Create Time=26 July 2020 22:32:31 NOTE: 4939 records were read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988. ... ... NOTE: 4917 records were read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988. NOTE: 4945 records were read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988. NOTE: 4939 records were read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988. NOTE: There were 761 observations read from the data set WORK.FILELIST. NOTE: The data set WORK.IMPORT_ALL has 3661162 observations and 40 variables. NOTE: DATA statement used (Total process time): real time 14:08.26 cpu time 36.32 seconds
As I said on the other threads that first method is the wrong way to process a series of files when you are starting with a list of filenames. You need to have the DO WHILE (NOT EOF) loop around the INPUT statement. Otherwise SAS will only read one line from the file.
What goes inside the DO loop is the code that reads the lines from those .DAT files.
What types of data is in the .DAT files? Are they delimited text files? If so what is the delimiter character?
They seem to have fixed length records and a varying number of records.
NOTE: 4917 records were read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988. NOTE: 4945 records were read from the infile MY_FILE. The minimum record length was 988. The maximum record length was 988.
So figure out how to read ONE of the files and then put that code into the your program that reads all of the files.
You might start by looking at what the lines in one of those files looks like. So code like this will dump the first 5 lines from the file to the log so you can look at it.
data _null_ ;
infile 'C:\2020\05\31\JFAADS2172-20200601-075337-DataLog_User.dat' obs=5;
input;
list;
run;
If those are DATALOG binary files it might take some detective work to figure out how to read them.
https://comp.lang.labview.narkive.com/lMvq284E/what-is-the-format-of-datalog-files#post8
Your INPUT seems wrong. You have 8 columns in your data but you have only three that you're trying to read in.
else input X Y Z;
Do you have code that reads in a single file correctly? If so, please post it.
So you have a list of files and you want to read each file.
You need to use the value of FILENAME in your list as the value to pass to the INFILE statement for the FILEVAR= option. You were passing it an undefined variable named MYFILE.
Since you are reading each file independently you can just use FIRSTSOBS=2 to skip the header line. But you need to use DO loop to read all of the lines from the file. Which means you need to have an explicit OUTPUT statement to write each observation read in the loop. Use the END= infile option to set a variable that will let you know when you have reached the end of this particular file.
data import_all;
set filelist;
length txt_file_name $256;
filevar=filename ;
txt_file_name = scan(filename, -1, "\");
infile my_file dsd firstobs=2 filevar=filevar truncover end=eof;
do while(not eof);
input X Y Z;
output;
end;
run;
Thanks @Tom
I think your code is working but need formatting the data in the input statement. The attached is a snip of the data saved as .txt although the original is .dat.
Perhaps this makes easier for you to understand what's going on.
Thanks a lot
Not sure why that file is such a problem to read. It looks really simple.
39 %let path=c:\downloads; 40 %let fname=20200504-130842-DataLog_User.txt; 41 42 data test; 43 infile "&path/&fname" truncover; 44 input date :yymmdd. time :time. var1-var36 ; 45 format date yymmdd10. time time12.3; 46 if _n_=1 then put (_all_) (=/); 47 run; NOTE: The infile "c:\downloads/20200504-130842-DataLog_User.txt" is: Filename=c:\downloads\20200504-130842-DataLog_User.txt, RECFM=V,LRECL=32767,File Size (bytes)=7918, Last Modified=09Mar2022:15:21:02, Create Time=09Mar2022:15:21:02 date=2020-05-04 time=13:08:34.980 var1=124.58929375 var2=2990.14305 var3=125.58929375 var4=1588561715 var5=0 var6=963 var7=140.00274422 var8=45.001243665 var9=43.625 var10=44.961509705 var11=45.003929138 var12=47 var13=4 var14=52146.125832 var15=0 var16=0.3617498916 var17=0.3589238228 var18=0.3586009095 var19=0.3491758251 var20=0.3717189497 var21=0.3688441229 var22=0.3685270161 var23=0.3587589883 var24=398.48587115 var25=409.23071232 var26=1.9158060364 var27=1.9536861719 var28=1.5265752787 var29=7.2809554684 var30=0 var31=0.7449547299 var32=53.858484392 var33=1.8026994928 var34=89.466766157 var35=415.53408769 var36=7.4182678721 NOTE: 8 records were read from the infile "c:\downloads/20200504-130842-DataLog_User.txt". The minimum record length was 988. The maximum record length was 988. NOTE: The data set WORK.TEST has 8 observations and 38 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.