Ladies and Gentlemen:
Just wondering if there a way to combine a lot of individual .TSV files (tab delimited text file) into a SAS data with one additional variable called 'SOURCE_FILE_NAME?' For example, HAVE1.TSV has the following variables:
drug_name | drug_id | company_id | company_name |
---|---|---|---|
drug A | |||
drug B |
I hope to combine thousands of files like HAVE1,TSV (they all have the same variable name) into WANT and add additional variable SOURCE_FILE_NAME. In this case, it would be HAVE1.
SOURCE_FILE_NAME | drug_name | drug_id | company_id | company_name |
---|---|---|---|---|
HAVE1 | drug A | |||
HAVE1 | drug B |
The complication is that each file may contain more than one observation. Is there a way to do so? Thank you!
There are a few ways to do this but the following works nicely.
FILENAME FILES PIPE "DIR E:\DRUGS*.TSV /B /O:N";
DATA _NULL_;
INFILE FILES TRUNCOVER END=LAST;
INPUT FNAME $200.;
I+1;
CALL SYMPUT("FNAME"||TRIM(LEFT(PUT(I,8.))),TRIM(FNAME));
IF LAST THEN CALL SYMPUT("TOTAL",TRIM(LEFT(PUT(I,8.))));
CALL SYMPUT("FNAMEDATE"||TRIM(LEFT(PUT(I,8.))),SUBSTR(TRIM(FNAME),15,8));
RUN;
OPTIONS MPRINT MLOGIC SYMBOLGEN;
%MACRO IMPORTS;
%DO I=1 %TO &TOTAL.;
DATA DRUGS1;
INFILE "E:\&&FNAME&I"
DELIMITER = "09"X
MISSOVER
DSD
LRECL=32767
FIRSTOBS=2
;
LENGTH
DRUG_NAME $ 6
DRUG_ID 8
CONPANY_ID 8
COMPANY_NAME $ 5 ;
FORMAT
DRUG_NAME $CHAR6.
DRUG_ID BEST1.
CONPANY_ID BEST1.
COMPANY_NAME $CHAR5. ;
INFORMAT
DRUG_NAME $CHAR6.
DRUG_ID BEST1.
CONPANY_ID BEST1.
COMPANY_NAME $CHAR5. ;
INPUT
DRUG_NAME : $CHAR6.
DRUG_ID : ?? BEST1.
CONPANY_ID : ?? BEST1.
COMPANY_NAME : $CHAR5. ;
;
SOURCE_FILE_NAME="&&FNAME&I";
RUN;
PROC APPEND BASE = DRUGS DATA = &SYSLAST.;
RUN;
%END;
%MEND;
%IMPORTS;
You could also use Call Execute in a similar way.
Keyword: filename, pipe, filevar=;
The following code has been tested on my own PC, please modify the environment accordingly.
%let dir=h:\temp\;
filename fname pipe "dir &dir._*.txt /b";
data want;
infile fname truncover;
input _fname $20.;
file2read=cats("&dir",_fname);
put file2read=;
infile dummy filevar=file2read end=last;
do while (not last);
input v1 v2;
output;
end;
run;
proc print data=want;
run;
Obs _fname v1 v2
1 _1.txt 1 2
2 _1.txt 3 4
3 _2.txt 5 6
4 _2.txt 7 8
I ran the code. It is able to get the file name and the number of observation correct, but all other variables in the individual TSV file is missing. Could you please tell me how would I best modify the code so that it would accommodate .TSV file. The thing is the tab separated file contains "" for each value in the file. Excel can real the file no problem, but maybe not SAS... Thanks -
For some reason this seems to keep the name one past, but maybe someone else here knows what I'm doing wrong.
data try01;
length filename txt_file_name $256;
retain txt_file_name;
infile "Path\*.TSV" eov=eov filename=filename truncover;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -2, ".\");
eov=0;
end;
input
*Place input code here;
run;
I think you need to do the input first or add INPUT@; before IF _N_ ....
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.