DATA Step, Macro, Functions and more

Import text file contest to corresponding cells

Reply
Regular Contributor
Posts: 156

Import text file contest to corresponding cells

Hi all

I have a folder containing a number of text file (.txt)

What I want to do is to create a SAS table, containing two columns:

one column contains the filename

The scond column contains the content of the text file. So I want all the contents of each file imported into one cell in column 2 with the correspoing file name in column 1

I know it might sound "messy" to put all the contents of a file in 1 cell but that is the exactly what I need

Any help is appreciated

Merry Christmas to you all

 

Super User
Super User
Posts: 7,430

Re: Import text file contest to corresponding cells

Well, your request doesn't make much sense, but it is possible:

/* Note this is for Windows*/

filename tmp pipe 'dir "<path to your files>\*.txt" /b';

/* Empty overall dataset */
proc sql;
create table FINAL (fname char(2000),data_from_file char(2000);
quit;

/* Create code to read in each file, set filename and append to empty from above */ data _null_; length f $200; infile tmp dlm='¬'; call execute('data tmp; infile "<path to your files>\'||strip(_infile_)||'"; length fname data_from_file $2000; fname="'||strip(_infile_)||'"; input data_from_file $; run;
proc append base=final data=tmp;
run;'); run;

Note, I have not tested - but it should be pretty close. 

Regular Contributor
Posts: 156

Re: Import text file contest to corresponding cells

I know it dosent make sense... here is afurther explanation

I have a collection of files for different items, I need to scan each file for keywords, for some keywords, if the keyword is there I will need to give a certain variable a value of 1, for other values, if the keyword is there I need to extract the numbers after that keyword

That is why i am importing the text files to process them further in SAS, but maybe there is a more efficient way of doing this?

Kind regrds

 

SAS Super FREQ
Posts: 685

Re: Import text file contest to corresponding cells

[ Edited ]

Since the max length for a character variable is 32767, often a file will not fit into the one varibale.

 

It is easy to read in all files from a directory using wirdcards, you will then get one observation for each record. You can still check for the conditions and set some flags. the FILENAME= option will give you the name of the file.

 

Here is an example

data fileContents;
  length extFilename _extFilename $ 1024;
  length extLine $ 32767;
  infile "c:\temp\*.txt" filename=_extFilename truncover ;
  extFilename = _extFilename;
  input
    @1 extLine
  ;
  lineLength = length(extLine);
  keep extFilename extLine lineLength;
run;

Bruno

Regular Contributor
Posts: 156

Re: Import text file contest to corresponding cells

THank you

The files are less than 32000 characters

When I use your code, the text is still broken into several rows.... any solution for that, thank you

Regular Contributor
Posts: 156

Re: Import text file contest to corresponding cells

The other problem witht the code above is that I will break up the text in the file into multiple rows.... which will limit the possibility of finding a text (much easier in a single cell)

Super User
Super User
Posts: 7,430

Re: Import text file contest to corresponding cells

So where are these files coming from/used for?  There are many text analytics software out there, SAS has one for instance.  It would be better than trying to re-invent it.

 

You will need to show some examples of what you mean, i.e. a text file with some data which illustrates your question and what you want out.

Regular Contributor
Posts: 156

Re: Import text file contest to corresponding cells

Here is an example of the content of a text file:

 

 

Item not functional, production date 1/7/1993. Meta-data downloaded from the unit showing elevated pressure. Unit sent to location 40 for repair. Unit returned 1/6/2016

So there many files containing information about different units, the unit number is the file name

If the text contains the words "unit returned" then I want to put the value "yes" to column "Returned"

Also, extracting production date and return date

 

Kind regards

 

 

Super User
Super User
Posts: 7,430

Re: Import text file contest to corresponding cells

If you don't need the rest of the text, read the file in word by word.  In a minor change to @Bruno_SAS excellent code:

data fileContents;
  length word extFilename _extFilename $ 1024;
  infile "c:\temp\*.txt" filename=_extFilename truncover dlm=" .,:;";
  extFilename = _extFilename;
  input word $;
  output;
  keep extFilename extLine lineLength;
run;

Will create a dataset called filecontents which holds two variables, filename and individual words as observations.  Then you can easily check if there is an occurence of specified text (upcase), and pull out dates etc.  Then just drop the data you don't want.

 

Super User
Posts: 17,960

Re: Import text file contest to corresponding cells

Your on Windows? Are you using SAS Base with x command? Using a command line to search the text file is goimg to be much faster than SAS

Super User
Posts: 17,960

Re: Import text file contest to corresponding cells

SAS Super FREQ
Posts: 685

Re: Import text file contest to corresponding cells

There was a problem in my previous code, since it would not read all the data, find below a new sample that will read all the file into one SAS data set, one record = one obs. A second DATA Step will then put all values into one variable, but as already mentioned, there is a limit. Find also a DS2 program that does the same, except, you can have longer variables (as long as your in DS2).

 

data fileContents;
  length extFilename _extFilename $ 1024;
  length extLine $ 32767;
  infile "c:\temp\*.txt" filename=_extFilename truncover end=last;
  row + 1;
  input
    @1 extLine $32767.
  ;
  extFilename = _extFilename;
run;

proc sort data=fileContents;
  by extFilename row;
run;

data want;
  set fileContents;
  length fileContents $ 32767;
  retain fileContents;
  by extFilename;

  if first.extFilename = 1 then do;
    fileContents = ' ';
  end;

  fileContents = cats(fileContents, extLine);

  if last.extFilename = 1 then do;
    extLineLength = length(fileContents);
    output;
  end;
run;

proc ds2;

  data want(overwrite=yes);
    dcl varchar(10485760) fileContents;
    retain fileContents;

    method run();
      dcl int extLineLength;
      dcl varchar(1024) eFilename;
      set fileContents;
      by extFilename;

      if first.extFilename = 1 then do;
        fileContents = ' ';
      end;

      fileContents = cats(fileContents, extLine);

      if last.extFilename = 1 then do;
        /* do something here with the file contents */
        extLineLength = length(fileContents);
      end;
    end;

  enddata;
run;

quit;

Bruno

Super User
Posts: 9,691

Re: Import text file contest to corresponding cells

I would more like filename + filevar= , which could give you more flexibility .


data want;
input fname $80.;
infile dummy filevar=fname length=len end=last;
do while(not last);
 filename=fname;
 input content $varying2000. len;
 output;
end;
cards;
/folders/myfolders/x.txt
/folders/myfolders/y.txt
;
run;

Ask a Question
Discussion stats
  • 12 replies
  • 378 views
  • 1 like
  • 5 in conversation