05-25-2016 03:51 AM
I am trying to import all the files in a folder and merge them all into one dataset in SAS.
I have been trying to emulate what I have been finding on guides online but because the files are TSV files it is not doing exactly as Id hope.
Would anyone have an idea what to do to get this done?
05-25-2016 04:41 AM
Assuming that TSV means the files in question are text, tab seperated variable files then something like:
data want; informat ...; format...; infile "thefile.tsv" dlm='05'x; input ...; run;
Obviously I have no idea about your files so you need to fill that in, i.e. what variables there are, what formats/informats etc. Once you have done, then start looking at doing many - i.e. start simple! Also, are all the file exactly the same structurally - have the same variables etc? If not you may have a hard time putting them into one format. If they are the same and they have no headers just append them using a operating system command then read in once:
x 'copy "c:\yourfolder\*.ysv" "c:\outputlocation\all.tsv"';
05-25-2016 09:14 PM
Yes the files are of the same structure and and contain header rows.
At the moment - using what RWS said I am merging to one fule using the X copy command and the importing via Import gata steps through EG and then deleting any rows which contain a header name in a field.
(informat, format, and input fields of a data step are new to me)
There is probably a more eloquent way of doing this?
05-26-2016 04:16 AM
If they all have headers then the OS concatenation probably isn't the best way. What I would do is a two stage approach, much like before, one datastep reads the files as just strings, drops the unecessary data and writes out one complete file with just one header. You could of course do it without writing out the file if you want to, but I would always want the file there in case I need to check it or QC (note not checked!):
data _null_; infile "c:\manyfiles\*.tsv"; file "c:\onefile\final.tsv"; input; if substr(_infile_,1,5) ne "ABCDE" then put _infile_; run; data want; infile "c:\onefile\final.tsv" dlm='09'x; informat...; format...; length...; input...; run;
05-26-2016 11:35 AM
Once you have the base data step to read one file all it takes is adding an option to the infile statement and some code to skip the header rows in the following files. The Option EOV sets a variable to 1 when a new file matching the wildcard list . When the value is one the the reassignment for skip is executed but not the input. Note the EXPLICIT output to only output data when the desired Input is executed.
data Want ; infile "C:\Path\*.tsv" delimiter='09'x DSD lrecl=32767 firstobs=2 eov=skip; informat ACCESSION_NUMBER $16. ; informat Client $STD_Client. ; informat SITE STD_Site. ; informat CLINICIAN $STD_Clnician. ; /* other variable informats*/ input @; if skip then skip=0; else do; input ACCESSION_NUMBER $ Client $ SITE CLINICIAN $ /* other variables*/ ;
output; end; run;