Desktop productivity for business analysts and programmers

How to import and merge all TSV files in a folder

Reply
Occasional Contributor
Posts: 6

How to import and merge all TSV files in a folder

Hello everyone, 

 

I am trying to import all the files in a folder and merge them all into one dataset in SAS. 

 

I have been trying to emulate what I have been finding on guides online but because the files are TSV files it is not doing exactly as Id hope. 

 

Would anyone have an idea what to do to get this done? 

 

Thanks, 

Super User
Super User
Posts: 7,686

Re: How to import and merge all TSV files in a folder

Assuming that TSV means the files in question are text, tab seperated variable files then something like:

data want;
  informat ...;
  format...;
  infile "thefile.tsv" dlm='05'x;
  input ...;
run;

Obviously I have no idea about your files so you need to fill that in, i.e. what variables there are, what formats/informats etc.  Once you have done, then start looking at doing many - i.e. start simple!  Also, are all the file exactly the same structurally - have the same variables etc?  If not you may have a hard time putting them into one format.  If they are the same and they have no headers just append them using a operating system command then read in once:

x 'copy "c:\yourfolder\*.ysv" "c:\outputlocation\all.tsv"'; 
Trusted Advisor
Posts: 1,116

Re: How to import and merge all TSV files in a folder

In ASCII the tab character is '09'x.

Super User
Posts: 11,105

Re: How to import and merge all TSV files in a folder

Are all of your TSV files the same structure?

Do they contain header rows?

Occasional Contributor
Posts: 6

Re: How to import and merge all TSV files in a folder

Hi, 

 

Yes the files are of the same structure and and contain header rows. 

 

At the moment - using what RWS said I am merging to one fule using the X copy command and the importing via Import gata steps through EG and then deleting any rows which contain a header name in a field. 

(informat, format, and input fields of a data step are new to me)

 

There is probably a more eloquent way of doing this? 

 

Thanks, 

Super User
Super User
Posts: 7,686

Re: How to import and merge all TSV files in a folder

If they all have headers then the OS concatenation probably isn't the best way.  What I would do is a two stage approach, much like before, one datastep reads the files as just strings, drops the unecessary data and writes out one complete file with just one header.  You could of course do it without writing out the file if you want to, but I would always want the file there in case I need to check it or QC (note not checked!):

data _null_;
  infile "c:\manyfiles\*.tsv";
  file "c:\onefile\final.tsv";
  input;
  if substr(_infile_,1,5) ne "ABCDE" then put _infile_;
run;

data want;
  infile "c:\onefile\final.tsv" dlm='09'x;
  informat...;
  format...; 
  length...;
  input...;
run;
  
Super User
Posts: 11,105

Re: How to import and merge all TSV files in a folder

Once you have the base data step to read one file all it takes is adding an option to the infile statement and some code to skip the header rows in the following files. The Option EOV sets a variable to 1 when a new file matching the wildcard list . When the value is one the the reassignment for skip is executed but not the input. Note the EXPLICIT output to only output data when the desired Input is executed.

 

data Want  ;
   infile "C:\Path\*.tsv" delimiter='09'x  DSD lrecl=32767 firstobs=2 eov=skip;
   informat ACCESSION_NUMBER $16. ;
   informat Client           $STD_Client. ;
   informat SITE             STD_Site. ;
   informat CLINICIAN        $STD_Clnician. ;
   /* other variable informats*/
   input @;
   if skip then skip=0;
   else do;
      input
         ACCESSION_NUMBER  $
         Client            $
         SITE
         CLINICIAN         $
         /* other variables*/
      ;
output; end; run;
Ask a Question
Discussion stats
  • 6 replies
  • 1194 views
  • 2 likes
  • 4 in conversation