Hi,
I have many tsv files to be imported in the SAS. The files are located in different subfolders named after numbers and underscore. For example, under the subfolder 6500_317, there are five tsv files: 8491_2020, 8611_2020, 5491_2020, 7453_2020, 2312_2020
I am wondering how I can import all files in all subfolders into the SAS and name after as data_tsvfilename.
I will appreciate it very much if someone can help here.
Thanks!
/*You can use OS command to get all these TSV file path.
After that,using PROC IMPORT it one by one.
Like this :
*/
%let path_in= c:\temp ;
filename x pipe "dir &path_in.\*.tsv /s /b";
data x;
infile x truncover;
input path $2000.;
run;
data _null_;
set x;
call execute(catt('proc import datafile="',path,'" out=data_',scan(path,-2,'.\'),' dbms=csv replace;run;'));
run;
Details matter a lot and you have left out a lot of them.
First you need to define a TSV file. What is that? Let's assume you meant a tab delimited text file.
Second do all of the files have the same type of content? The same fields in the same order?
Third do the fields in the file contain the information that appears to be encoded into the directory and file names? Or do you need to create new variables in the generated dataset (or datasets) to store the information contained in the filename.
IF the files all contain the same information then it is probably more efficient to just make one dataset. But if you need multiple datasets then you need to explain what names you want to use. Names in SAS cannot start with digits or contain hyphens.
The first step you need to do is get the list of files. There are literally hundreds of posts on this site with descriptions of how to get a list of existing files into a dataset.
I take TSV to be "tab separated values". I have an example for how to accomplish this here:
https://blogs.sas.com/content/sasdummy/2018/10/09/read-multiple-text-files/
You can use wildcard notation in the INFILE statement to match the files:
data accel;
infile "/home/userid/tsv/*.tsv"
dlm='09'x;
length counter 8
timestamp 8
x 8 y 8 z 8
filename $ 25;
input counter timestamp x y z filename;
run;
Edit: just noticed you want the subfolders too. For this, you need to gather all of the filenames ahead by querying the contents of the directories. The DOPEN and DREAD functions help with this. Sample code here in the doc.
/*You can use OS command to get all these TSV file path.
After that,using PROC IMPORT it one by one.
Like this :
*/
%let path_in= c:\temp ;
filename x pipe "dir &path_in.\*.tsv /s /b";
data x;
infile x truncover;
input path $2000.;
run;
data _null_;
set x;
call execute(catt('proc import datafile="',path,'" out=data_',scan(path,-2,'.\'),' dbms=csv replace;run;'));
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.