09-24-2013 11:58 PM
Ladies and Gentlemen:
Just wondering if there a way to combine a lot of individual .TSV files (tab delimited text file) into a SAS data with one additional variable called 'SOURCE_FILE_NAME?' For example, HAVE1.TSV has the following variables:
I hope to combine thousands of files like HAVE1,TSV (they all have the same variable name) into WANT and add additional variable SOURCE_FILE_NAME. In this case, it would be HAVE1.
The complication is that each file may contain more than one observation. Is there a way to do so? Thank you!
09-25-2013 12:25 AM
There are a few ways to do this but the following works nicely.
FILENAME FILES PIPE "DIR E:\DRUGS*.TSV /B /O:N";
INFILE FILES TRUNCOVER END=LAST;
INPUT FNAME $200.;
IF LAST THEN CALL SYMPUT("TOTAL",TRIM(LEFT(PUT(I,8.))));
OPTIONS MPRINT MLOGIC SYMBOLGEN;
%DO I=1 %TO &TOTAL.;
DELIMITER = "09"X
DRUG_NAME $ 6
COMPANY_NAME $ 5 ;
COMPANY_NAME $CHAR5. ;
COMPANY_NAME $CHAR5. ;
DRUG_NAME : $CHAR6.
DRUG_ID : ?? BEST1.
CONPANY_ID : ?? BEST1.
COMPANY_NAME : $CHAR5. ;
PROC APPEND BASE = DRUGS DATA = &SYSLAST.;
You could also use Call Execute in a similar way.
09-25-2013 09:41 AM
Keyword: filename, pipe, filevar=;
The following code has been tested on my own PC, please modify the environment accordingly.
filename fname pipe "dir &dir._*.txt /b";
infile fname truncover;
input _fname $20.;
infile dummy filevar=file2read end=last;
do while (not last);
input v1 v2;
proc print data=want;
Obs _fname v1 v2
1 _1.txt 1 2
2 _1.txt 3 4
3 _2.txt 5 6
4 _2.txt 7 8
10-04-2013 11:15 PM
I ran the code. It is able to get the file name and the number of observation correct, but all other variables in the individual TSV file is missing. Could you please tell me how would I best modify the code so that it would accommodate .TSV file. The thing is the tab separated file contains "" for each value in the file. Excel can real the file no problem, but maybe not SAS... Thanks -
09-25-2013 11:08 AM
For some reason this seems to keep the name one past, but maybe someone else here knows what I'm doing wrong.
length filename txt_file_name $256;
infile "Path\*.TSV" eov=eov filename=filename truncover;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -2, ".\");
*Place input code here;