Programming the statistical procedures from SAS

reading multiple textfiles

Accepted Solution Solved
Reply
Regular Contributor
Posts: 161
Accepted Solution

reading multiple textfiles

hello

I have about 150 text files to read into SAS. I just want the last column of these files, after reading them I need to concatenate these files into one. Any help is greatly appreciated.

RG


Accepted Solutions
Solution
‎08-21-2011 05:15 PM
Esteemed Advisor
Posts: 7,058

Re: reading multiple textfiles

I think that Tom answered most of your questions, but I tested the code on your sample file and discovered that it still wouldn't work correctly without a slight change.  Thus, I recommend you try:

filename allcsv "C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT\respondents_*.dat";

data out.class;

  infile allcsv lrecl=10000 filename=filename;

  input;

  lastfield=input(scan(_infile_,1,,'bs'),best12.);

run;

Of course that presumes that you have used a libname statement to assign the out library.

To reiterate what Tom had mentioned:

The filename statement assigns all files in the C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT directory that begin with the characters respondents_ and have a .dat extension.

The data out.class statement creates a SAS file called class in the out library.

The infile statement reads all of the files identified in the filename statement as long as the records are less than or equal to 10000 characters in length.  It also creates a temporary variable called filename that captures the name of the file from which each record came.  If you want to keep that information you would have to create a new variable (e.g., fname).  You could do that before the run statement (i.e., fname=filenameSmiley Wink

The input; statement simply reads an entire record and captures it in a variable called _infile_

The statement I revised which now looks like: lastfield=input(scan(_infile_,1,,'bs'),best12.);

creates a variable called lastfield by reading the last string in the record and inputing it using the best12. informat.  You can understand the line if you follow it from left to right:

lastfield= is simply telling SAS to create a field called lastfield

input(.., best12.) is telling SAS to use the input function to convert the string captured by the following scan function and convert it to be a number by using the best12. informat

the scan(_infile_,1,,'bs) part is telling SAS to scan to _infile_ variable, for the 1st field in finds, not specifying any particular characters to use as delimiters but, instead, the 'bs' (yes I know .. odd letters to use) tell SAS to scan from right to left (i.e., b) and use any kind of white space (i.e., s) as a delimiter.

The code worked on your example file.

View solution in original post


All Replies
Esteemed Advisor
Posts: 7,058

reading multiple textfiles

Some questions:

What version of SAS are you using?

What operating system are you on?

What do the files look like (e.g., delimited, fixed format, etc.)?

What kind of field is in the last column and is it the same for all files?

Regular Contributor
Posts: 161

Re: reading multiple textfiles

just forgot to mention that these variables don't have any variable names

What version of SAS are you using? 9.2

What operating system are you on? Window 7, 64 bit

What do the files look like (e.g., delimited, fixed format, etc.)? Tab delimited

What kind of field is in the last column and is it the same for all files? yes they all look the same

I am attaching an example file, but I believe your codes should help work for me.

Attachment
Esteemed Advisor
Posts: 7,058

reading multiple textfiles

You didn't provide answers to any of my questions, thus I'll just post an example and hope it will help.

Say I have two text files, each in the c:\art directory and called sometextfile1.csv and sometextfile2.csv, respectively, no header records and a numeric value in the last column.  So the files might look like:

sometextfile1.csv

1,1,1,1,1,1,9

1,1,1,1,1,1,8

1,1,1,1,1,1,7

and

sometextfile2.csv

1,1,1,1,6

1,1,1,1,5

1,1,1,1,4

and I want to end up with a SAS file, called want, that looks like:

lastfield

9

8

7

6

5

4

One could do that with the following code:

filename allcsv "C:\art\sometextfile*.csv";

data want;

  infile allcsv filename=filename;

  input;

  lastfield=input(scan(_infile_,-1,","),best12.);

run;

HTH,

Art

Esteemed Advisor
Posts: 7,058

Re: reading multiple textfiles

The sample file you sent appears to be space rather than tab delimited.  If the files are space delimited you can probably just use code like:

filename allcsv "C:\art\sometextfile*.dat";

data want;

  infile allcsv filename=filename;

  input;

  lastfield=input(scan(_infile_,-1),best12.);

run;

If they really are tab delimited, just change the "," to "09"x

Regular Contributor
Posts: 161

Re: reading multiple textfiles

Thank you very much, I'll try it right now

Regular Contributor
Posts: 161

Re: reading multiple textfiles

I think the format for my file is different. it's a .dat file and I am getting this error:

ERROR: Physical file does not exist,

       C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT\commandfile..dat?

This is my code:

%macro RRRUN;

%let num_reps=1;

%do seed=1 %to &num_reps;

%LET commandfile=respondents_&SEED;

LIBNAME Out

"C:\CDM1\5K\OUT";

RUN;

filename allcsv "C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT\commandfile..dat”";

Data out.class;

infile allcsv filename=filename;

input;

lastfield=input(scan(_infile_,,-1,","),best12.);

run;

%END;

%MEND;

%RRRUN;

data OUT.class10000;

set OUT.temp_1-OUT.temp_7;

run;

Esteemed Advisor
Posts: 7,058

Re: reading multiple textfiles

I'm not sure why you feel the need to drive this with a macro, but I did notice three things:

First the following line is probably wrong:

filename allcsv "C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT\commandfile..dat”";

as I understand your code, comaandfile is a macro variable, thus it should be stated as &commandfile.

I looks like you left off the ampersand.  Second, the end of that line appears to have multiple quotatioins.  Why?  It should just end with a single double quote semi-colon.

Also, when you call you macro, you end the line with a semi-colon.  There shouldn't be any semi-colon there.

Finally, once you get to the point of where you can actually find the files you are trying to read, you have an extra comma in your input statement that my example code didn't have.  But, before going there, I would suggest fixing the other problems.

Art

Regular Contributor
Posts: 161

Re: reading multiple textfiles

thanks I fixed those problems, although it does read something, it is not the last column.

%macro RRRUN;

%let num_reps=1;

%do seed=1 %to &num_reps;

%LET commandfile=respondents_&SEED;

LIBNAME Out

"C:\CDM1\5K\OUT";

RUN;

filename allcsv "C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT\&commandfile..dat";

Data out.class;

infile allcsv filename=filename;

input;

lastfield=input(scan(_infile_,-1,","),best12.);

run;

%END;

%MEND;

%RRRUN;

This is my LOG file:

425  %macro RRRUN;

426  OPTIONS MACROGEN

427  %let num_reps=1;

428  %do seed=1 %to &num_reps;

429  %LET commandfile=respondents_&SEED;

430  LIBNAME Out "C:\CDM1\5K\OUT";

431  RUN;

432

433  filename allcsv "C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT\&commandfile..dat";

434  Data out.class;

435  infile allcsv filename=filename;

436  input;

437  lastfield=input(scan(_infile_,-1,","),best12.);

438  run;

439

440  %END;

441  %MEND;

442

443  %RRRUN;

NOTE: Line generated by the invoked macro "RRRUN".

3     LIBNAME Out "C:\CDM1\5K\OUT";

      -------

      13

ERROR 13-12: Unrecognized SAS option name, LIBNAME.

3  !  LIBNAME Out "C:\CDM1\5K\OUT";

              ---

              13

ERROR 13-12: Unrecognized SAS option name, OUT.

3  !  LIBNAME Out "C:\CDM1\5K\OUT";

                  ----------------

                  13

MACROGEN(RRRUN):   Out "C:\CDM1\5K\OUT";

MACROGEN(RRRUN):   RUN;

MACROGEN(RRRUN):   filename allcsv

"C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT\respondents_1.dat";

MACROGEN(RRRUN):   Data out.class;

MACROGEN(RRRUN):   infile allcsv filename=filename;

MACROGEN(RRRUN):   input;

MACROGEN(RRRUN):   lastfield=input(scan(_infile_,-1,","),best12.);

MACROGEN(RRRUN):   run;

ERROR 13-12: Unrecognized SAS option name, "C:\CDM1\5K\OUT".

Esteemed Advisor
Posts: 7,058

Re: reading multiple textfiles

I still think you would be better off without incorporating everything in a macro but, since you have, one thing at a time.  Why it there a run; statement after your filename statement?

It is totally unnecessary and may be the cause of your initial problem, but I wouldl think that it is more entailed.  I would start by shutting down SAS and starting a new session, and then running your code after fixing the problems I mentioned in my last post.

Esteemed Advisor
Posts: 7,058

Re: reading multiple textfiles

In addition to my previous comments, there is another problem with your macro .. it simply won't do what you want to do.  You create a file called out.class and then overwrite that file with each iteration of your macro.

Then you try to concatinate 7 files called set OUT.temp_1-OUT.temp_7 which weren't even created by the macro.

I would think that it would make a lot more sense to simply run something like:

filename allcsv "C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT\respondents_*.dat;

data want;

  infile allcsv filename=filename;

  input;

  lastfield=input(scan(_infile_,-1),best12.);

run;

Regular Contributor
Posts: 161

Re: reading multiple textfiles

It's sill all o's istead of numbers these numbers : 17.00000, 14.00000,..

Thanks

filename allcsv "C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT\respondents_*.dat";

Data out.class;

infile allcsv filename=filename;

input;

lastfield=input(scan(_infile_,-1),best12.);

run;

Regular Contributor
Posts: 161

Re: reading multiple textfiles

I was wondering what do these lines mean. and why should be an * after the file name in filename allcsv

infile allcsv filename=filename;

input;

lastfield=input(scan(_infile_,-1),best12.);

Super User
Super User
Posts: 6,137

Re: reading multiple textfiles

Not finding the correct values is probably a result of the lines in your files being longer than the default line length for an INFILE statement. You should see notes in the log to the effect that some lines were truncated.  Add a LRECL option to your INFILE statement such as LRECL=10000.

Art's program is taking advantage of the fact that SAS can support a single wildcard (*) in a filename.  You can test if it finds the files you want by issuing a DIR command with the same filename at the command line prompt on your PC.

The INPUT statement without any variables will force SAS to read the line into the input buffer. You can reference this buffer using the special variable name _INFILE_.   The SCAN function with a negative argument will scan backwards from the end of the string.  You might want to specify what delimiters to use when scanning. For example change this to SCAN(_INFILE_,-1,' ') to force it to use space as the delimiter when scanning.  Your example file was delimited with spaces and not tabs as you said above.

The INPUT function will convert the characters pulled off the end of the line into a number using the BEST12. format.

If you want to save the name of the individual file that a particular record came from in the output dataset then you will need to assign it to another variable as the variable named in the FILENAME= option of the INFILE statement will be automatically dropped from the output dataset.

Solution
‎08-21-2011 05:15 PM
Esteemed Advisor
Posts: 7,058

Re: reading multiple textfiles

I think that Tom answered most of your questions, but I tested the code on your sample file and discovered that it still wouldn't work correctly without a slight change.  Thus, I recommend you try:

filename allcsv "C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT\respondents_*.dat";

data out.class;

  infile allcsv lrecl=10000 filename=filename;

  input;

  lastfield=input(scan(_infile_,1,,'bs'),best12.);

run;

Of course that presumes that you have used a libname statement to assign the out library.

To reiterate what Tom had mentioned:

The filename statement assigns all files in the C:\CDM1\5K\5K_10000obs\5K_10000obs_RIT\OUT directory that begin with the characters respondents_ and have a .dat extension.

The data out.class statement creates a SAS file called class in the out library.

The infile statement reads all of the files identified in the filename statement as long as the records are less than or equal to 10000 characters in length.  It also creates a temporary variable called filename that captures the name of the file from which each record came.  If you want to keep that information you would have to create a new variable (e.g., fname).  You could do that before the run statement (i.e., fname=filenameSmiley Wink

The input; statement simply reads an entire record and captures it in a variable called _infile_

The statement I revised which now looks like: lastfield=input(scan(_infile_,1,,'bs'),best12.);

creates a variable called lastfield by reading the last string in the record and inputing it using the best12. informat.  You can understand the line if you follow it from left to right:

lastfield= is simply telling SAS to create a field called lastfield

input(.., best12.) is telling SAS to use the input function to convert the string captured by the following scan function and convert it to be a number by using the best12. informat

the scan(_infile_,1,,'bs) part is telling SAS to scan to _infile_ variable, for the 1st field in finds, not specifying any particular characters to use as delimiters but, instead, the 'bs' (yes I know .. odd letters to use) tell SAS to scan from right to left (i.e., b) and use any kind of white space (i.e., s) as a delimiter.

The code worked on your example file.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 447 views
  • 0 likes
  • 3 in conversation