Dear SAS community,
I need to simultaneously read in multiple text files, each of which has a different number of columns, and create a single data set in SAS. Please see the attached example data. The data is space delimited.
All of the files in the folder have the same file ending and suffix. The following code allows me to import all the text files when the total number of columns is known. In the case of my example code, it is 16 columns. Unfortunately, my data no longer has a known number of columns.
I need to create columns and variables that include information included in the file name. I have been successful with this. Please have a look at the code here.
I am operating SAS version 9.4
data MyData;
length filename fname $254;
infile 'C:\RES999_BNR*.TXT' dlm=' ' filename=fname firstobs=2 ;
filename=scan(fname,-2,'./\');
ID =scan(filename,1,'_','m');
Night =scan(filename,2,'_');
length var1-var16 $ 16.;
input var1$ var2-var16;
spectra = dequote(var1);
drop var1;
run;
The posted code will not work properly if the files have headers. You skip the header line on the first file but not the later files.
It also will not work properly if they don't have headers, since then skipping the first line misses one line of data.
You need to explain how the number of columns varies to see if there is a solution.
Hi again,
The data are from EEG recordings of varying lengths. The length of the recording determines how many columns will be present in a given data set. There are always 321 rows of data. I don't need the header with the X prefixes. All subsequent rows and columns I need.
Best,
Ian
@USCSS_Nostromo wrote:
Hi again,
The data are from EEG recordings of varying lengths. The length of the recording determines how many columns will be present in a given data set. There are always 321 rows of data. I don't need the header with the X prefixes. All subsequent rows and columns I need.
Best,
Ian
You should have included that detail in the original posting. So instead of mixed number of totally different variables you have a varying number of repeats of the same variable. Personally I would read the data into a tall format.
Does the data contain any missing values in the middle of a line of values? If not you can just loop until you read a missing value.
So let's assume the files all have a header row (that you want to ignore) and that also have a first column with some information that identifies the row. So first read that row id value and then loop reading one value at a time from the row.
So something like this:
data want ;
length fname filename $200 ;
infile 'myfiles*.txt' dsd truncover filevar=fname ;
input rowid $ @ ;
if fname ne lag(fname) then delete;
filename=fname;
do col=1 by 1 until(value=.);
input value @ ;
if col=1 or value ne . then output;
end;
run;
If there can be misisng values in the middle of the row then there are other ways to stop the DO loop. You could count the number of values in the row by looking at the automatic variable _INFILE_.
do col=1 to countw(_infile_,',')-1;
If the line length for any row is ever greater than 32K bytes then there are ways using the LENGTH= and COL= options on the INFILE statement to detect when you have reached the end of a line.
Here's a code snippet that should help you with the issue of getting the variable number of columns into variables. Note that the first data step is just to create some text data to test on.
Tom
data _null_;
file "C:\some file.txt";
Rec = "abc 123 456 789 012 345 678 901 123 456 789 012 345 678 901 345"; put Rec;
Rec = "def 123 456 789 012 345 678 901 123 456 789 012 345 678 901 345"; put Rec;
Rec = "ghi 123 456 789 012 345 678 901 123 456 789 012 345 678 901 345"; put Rec;
Rec = "jkl 123 456 789 012 345 678 901 123 456 789 012 345 678 901 345"; put Rec;
run;
data MyData(drop=_:);
infile C:\some file.txt';
length _InRec $32767;
length var01-var50 $ 16.;
array _RecVars{50} var01-var50;
input;
_InRec = _infile_;
_VarCount = countw(_InRec, " ");
do _i = 1 to _VarCount;
_RecVars{_i} = scan(_InRec, _i, " ");
end;
run;
Hi Tom,
Thank you once again for your help.
Your code works for one file that I read in as a test. But I noticed a problem that I was previously unaware of. Some of my data utilizes scientific notation, such as the following 1.55042962215361e-05, whereas other other data in the same file, the majority of the data, are expressed purely numerically.
Once imported, cells are left empty, and some data is scrambled.
Sincerely,
Ian
SAS will read a value like 1.55042962215361e-05 as a number.
Show the code and an example line of data that it is having trouble with.
You can likely get what you need by choosing suitable informats and formats.
data test;
attrib myvar informat=best32. format=best32.;
input myvar;
datalines;
1.55042962215361e-05
1.5
2
;
proc print data=test;
run;
I've modified it to input the number using the BEST format. See if this advances your progress.
Tom
data _null_;
file "C:\Some file.txt";
Rec = "abc 123 456 789 012 345 678 901 1.55042962215361e-05 456 789 012 345 678 901 345"; put Rec;
Rec = "def 123 456 789 012 345 678 901 123 456 789 012 345 1.55042962215361e5 901 345"; put Rec;
Rec = "ghi 123 456 789 -1.55042962215361e-05 345 678 901 123 456 789 012 345 678 901 345"; put Rec;
Rec = "jkl 123 456 789 012 345 678 901 123 456 -1.55042962215361e05 012 345 678 901 345"; put Rec;
run;
data MyData(drop=_:);
infile "C:\Some file.txt";
length _InRec $32767;
length var01-var50 8;
array _RecVars{50} var01-var50;
input;
_InRec = _infile_;
_VarCount = countw(_InRec, " ");
do _i = 1 to _VarCount;
_RecVars{_i} = input(scan(_InRec, _i, " "), best32.);
end;
run;
Not sure why you are making the array, but if you are concerned it will not be large enough then just make it larger than you expect to need.
You haven't shown enough of the log to see what error you are having.
Also you need to use one of the Insert Code buttons in the editor when pasting in log so that formatting is not removed.
In general notes about missing values will not cause a data step to stop.
I am a little concerned that your min and max record lengths are the same. That is not normal in a delimited text file, but perhaps it is just a side effect of stopping the data step early. Perhaps it only read in one line.
If the problem is that there are more than 50 occurrences, a prior step to count the max number of occurrences should fix you up.
Tom
data _null_;
file "C:\Some file.txt";
Rec = "987 123 456 789 012 345 678 901 1.55042962215361e-05 456 789 012 345 678 901 345"; put Rec;
Rec = "654 123 456 789 012 345 678 901 123 456 789 012 345 1.55042962215361e5 901 345"; put Rec;
Rec = "321 123 456 789 -1.55042962215361e-05 345 678 901 123 456 789 012 345 678 901 345"; put Rec;
Rec = "098 123 456 789 012 345 678 901 123 456 -1.55042962215361e05 012 345 678 901 345"; put Rec;
run;
data _null_;
retain _MaxVarCount 0;
infile "C:\Some file.txt" end=_LastRec;
input;
_VarCount = countw(_infile_, " ");
if _VarCount > _MaxVarCount then
_MaxVarCount = _VarCount;
if _LastRec then
call symput("VarCount", strip(put(_VarCount, best15.)));
run;
data MyData(drop=_:);
infile "C:\Some file.txt";
length _InRec $32767;
length var01-var&VarCount. 8;
array _RecVars{&VarCount.} var01-var&VarCount.;
input;
_InRec = _infile_;
_VarCount = countw(_InRec, " ");
do _i = 1 to _VarCount;
_RecVars{_i} = input(scan(_InRec, _i, " "), best32.);
end;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.