12-23-2016 05:53 AM
I have a folder containing a number of text file (.txt)
What I want to do is to create a SAS table, containing two columns:
one column contains the filename
The scond column contains the content of the text file. So I want all the contents of each file imported into one cell in column 2 with the correspoing file name in column 1
I know it might sound "messy" to put all the contents of a file in 1 cell but that is the exactly what I need
Any help is appreciated
Merry Christmas to you all
12-23-2016 06:27 AM
Well, your request doesn't make much sense, but it is possible:
/* Note this is for Windows*/ filename tmp pipe 'dir "<path to your files>\*.txt" /b';
/* Empty overall dataset */
create table FINAL (fname char(2000),data_from_file char(2000);
/* Create code to read in each file, set filename and append to empty from above */ data _null_; length f $200; infile tmp dlm='¬'; call execute('data tmp; infile "<path to your files>\'||strip(_infile_)||'"; length fname data_from_file $2000; fname="'||strip(_infile_)||'"; input data_from_file $; run;
proc append base=final data=tmp;
Note, I have not tested - but it should be pretty close.
12-23-2016 06:30 AM
I know it dosent make sense... here is afurther explanation
I have a collection of files for different items, I need to scan each file for keywords, for some keywords, if the keyword is there I will need to give a certain variable a value of 1, for other values, if the keyword is there I need to extract the numbers after that keyword
That is why i am importing the text files to process them further in SAS, but maybe there is a more efficient way of doing this?
12-23-2016 06:49 AM - edited 12-23-2016 06:50 AM
Since the max length for a character variable is 32767, often a file will not fit into the one varibale.
It is easy to read in all files from a directory using wirdcards, you will then get one observation for each record. You can still check for the conditions and set some flags. the FILENAME= option will give you the name of the file.
Here is an example
data fileContents; length extFilename _extFilename $ 1024; length extLine $ 32767; infile "c:\temp\*.txt" filename=_extFilename truncover ; extFilename = _extFilename; input @1 extLine ; lineLength = length(extLine); keep extFilename extLine lineLength; run;
12-23-2016 07:00 AM
The files are less than 32000 characters
When I use your code, the text is still broken into several rows.... any solution for that, thank you
12-23-2016 06:40 AM
The other problem witht the code above is that I will break up the text in the file into multiple rows.... which will limit the possibility of finding a text (much easier in a single cell)
12-23-2016 06:48 AM
So where are these files coming from/used for? There are many text analytics software out there, SAS has one for instance. It would be better than trying to re-invent it.
You will need to show some examples of what you mean, i.e. a text file with some data which illustrates your question and what you want out.
12-23-2016 07:07 AM
Here is an example of the content of a text file:
Item not functional, production date 1/7/1993. Meta-data downloaded from the unit showing elevated pressure. Unit sent to location 40 for repair. Unit returned 1/6/2016
So there many files containing information about different units, the unit number is the file name
If the text contains the words "unit returned" then I want to put the value "yes" to column "Returned"
Also, extracting production date and return date
12-23-2016 07:23 AM
If you don't need the rest of the text, read the file in word by word. In a minor change to @Bruno_SAS excellent code:
data fileContents; length word extFilename _extFilename $ 1024; infile "c:\temp\*.txt" filename=_extFilename truncover dlm=" .,:;"; extFilename = _extFilename; input word $; output; keep extFilename extLine lineLength; run;
Will create a dataset called filecontents which holds two variables, filename and individual words as observations. Then you can easily check if there is an occurence of specified text (upcase), and pull out dates etc. Then just drop the data you don't want.
12-23-2016 08:18 AM
There was a problem in my previous code, since it would not read all the data, find below a new sample that will read all the file into one SAS data set, one record = one obs. A second DATA Step will then put all values into one variable, but as already mentioned, there is a limit. Find also a DS2 program that does the same, except, you can have longer variables (as long as your in DS2).
data fileContents; length extFilename _extFilename $ 1024; length extLine $ 32767; infile "c:\temp\*.txt" filename=_extFilename truncover end=last; row + 1; input @1 extLine $32767. ; extFilename = _extFilename; run; proc sort data=fileContents; by extFilename row; run; data want; set fileContents; length fileContents $ 32767; retain fileContents; by extFilename; if first.extFilename = 1 then do; fileContents = ' '; end; fileContents = cats(fileContents, extLine); if last.extFilename = 1 then do; extLineLength = length(fileContents); output; end; run; proc ds2; data want(overwrite=yes); dcl varchar(10485760) fileContents; retain fileContents; method run(); dcl int extLineLength; dcl varchar(1024) eFilename; set fileContents; by extFilename; if first.extFilename = 1 then do; fileContents = ' '; end; fileContents = cats(fileContents, extLine); if last.extFilename = 1 then do; /* do something here with the file contents */ extLineLength = length(fileContents); end; end; enddata; run; quit;
12-24-2016 01:17 AM
I would more like filename + filevar= , which could give you more flexibility . data want; input fname $80.; infile dummy filevar=fname length=len end=last; do while(not last); filename=fname; input content $varying2000. len; output; end; cards; /folders/myfolders/x.txt /folders/myfolders/y.txt ; run;