BookmarkSubscribeRSS Feed
ammarhm
Lapis Lazuli | Level 10

Hi all

I have a folder containing a number of text file (.txt)

What I want to do is to create a SAS table, containing two columns:

one column contains the filename

The scond column contains the content of the text file. So I want all the contents of each file imported into one cell in column 2 with the correspoing file name in column 1

I know it might sound "messy" to put all the contents of a file in 1 cell but that is the exactly what I need

Any help is appreciated

Merry Christmas to you all

 

12 REPLIES 12
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, your request doesn't make much sense, but it is possible:

/* Note this is for Windows*/

filename tmp pipe 'dir "<path to your files>\*.txt" /b';

/* Empty overall dataset */
proc sql;
create table FINAL (fname char(2000),data_from_file char(2000);
quit;

/* Create code to read in each file, set filename and append to empty from above */ data _null_; length f $200; infile tmp dlm='¬'; call execute('data tmp; infile "<path to your files>\'||strip(_infile_)||'"; length fname data_from_file $2000; fname="'||strip(_infile_)||'"; input data_from_file $; run;
proc append base=final data=tmp;
run;'); run;

Note, I have not tested - but it should be pretty close. 

ammarhm
Lapis Lazuli | Level 10

I know it dosent make sense... here is afurther explanation

I have a collection of files for different items, I need to scan each file for keywords, for some keywords, if the keyword is there I will need to give a certain variable a value of 1, for other values, if the keyword is there I need to extract the numbers after that keyword

That is why i am importing the text files to process them further in SAS, but maybe there is a more efficient way of doing this?

Kind regrds

 

BrunoMueller
SAS Super FREQ

Since the max length for a character variable is 32767, often a file will not fit into the one varibale.

 

It is easy to read in all files from a directory using wirdcards, you will then get one observation for each record. You can still check for the conditions and set some flags. the FILENAME= option will give you the name of the file.

 

Here is an example

data fileContents;
  length extFilename _extFilename $ 1024;
  length extLine $ 32767;
  infile "c:\temp\*.txt" filename=_extFilename truncover ;
  extFilename = _extFilename;
  input
    @1 extLine
  ;
  lineLength = length(extLine);
  keep extFilename extLine lineLength;
run;

Bruno

ammarhm
Lapis Lazuli | Level 10

THank you

The files are less than 32000 characters

When I use your code, the text is still broken into several rows.... any solution for that, thank you

ammarhm
Lapis Lazuli | Level 10

The other problem witht the code above is that I will break up the text in the file into multiple rows.... which will limit the possibility of finding a text (much easier in a single cell)

RW9
Diamond | Level 26 RW9
Diamond | Level 26

So where are these files coming from/used for?  There are many text analytics software out there, SAS has one for instance.  It would be better than trying to re-invent it.

 

You will need to show some examples of what you mean, i.e. a text file with some data which illustrates your question and what you want out.

ammarhm
Lapis Lazuli | Level 10

Here is an example of the content of a text file:

 

 

Item not functional, production date 1/7/1993. Meta-data downloaded from the unit showing elevated pressure. Unit sent to location 40 for repair. Unit returned 1/6/2016

So there many files containing information about different units, the unit number is the file name

If the text contains the words "unit returned" then I want to put the value "yes" to column "Returned"

Also, extracting production date and return date

 

Kind regards

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

If you don't need the rest of the text, read the file in word by word.  In a minor change to @BrunoMueller excellent code:

data fileContents;
  length word extFilename _extFilename $ 1024;
  infile "c:\temp\*.txt" filename=_extFilename truncover dlm=" .,:;";
  extFilename = _extFilename;
  input word $;
  output;
  keep extFilename extLine lineLength;
run;

Will create a dataset called filecontents which holds two variables, filename and individual words as observations.  Then you can easily check if there is an occurence of specified text (upcase), and pull out dates etc.  Then just drop the data you don't want.

 

Reeza
Super User

Your on Windows? Are you using SAS Base with x command? Using a command line to search the text file is goimg to be much faster than SAS

BrunoMueller
SAS Super FREQ

There was a problem in my previous code, since it would not read all the data, find below a new sample that will read all the file into one SAS data set, one record = one obs. A second DATA Step will then put all values into one variable, but as already mentioned, there is a limit. Find also a DS2 program that does the same, except, you can have longer variables (as long as your in DS2).

 

data fileContents;
  length extFilename _extFilename $ 1024;
  length extLine $ 32767;
  infile "c:\temp\*.txt" filename=_extFilename truncover end=last;
  row + 1;
  input
    @1 extLine $32767.
  ;
  extFilename = _extFilename;
run;

proc sort data=fileContents;
  by extFilename row;
run;

data want;
  set fileContents;
  length fileContents $ 32767;
  retain fileContents;
  by extFilename;

  if first.extFilename = 1 then do;
    fileContents = ' ';
  end;

  fileContents = cats(fileContents, extLine);

  if last.extFilename = 1 then do;
    extLineLength = length(fileContents);
    output;
  end;
run;

proc ds2;

  data want(overwrite=yes);
    dcl varchar(10485760) fileContents;
    retain fileContents;

    method run();
      dcl int extLineLength;
      dcl varchar(1024) eFilename;
      set fileContents;
      by extFilename;

      if first.extFilename = 1 then do;
        fileContents = ' ';
      end;

      fileContents = cats(fileContents, extLine);

      if last.extFilename = 1 then do;
        /* do something here with the file contents */
        extLineLength = length(fileContents);
      end;
    end;

  enddata;
run;

quit;

Bruno

Ksharp
Super User
I would more like filename + filevar= , which could give you more flexibility .


data want;
input fname $80.;
infile dummy filevar=fname length=len end=last;
do while(not last);
 filename=fname;
 input content $varying2000. len;
 output;
end;
cards;
/folders/myfolders/x.txt
/folders/myfolders/y.txt
;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1236 views
  • 1 like
  • 5 in conversation