DATA Step, Macro, Functions and more

Batch reading text files with special characters the variable names

Accepted Solution Solved
Reply
Contributor
Posts: 21
Accepted Solution

Batch reading text files with special characters the variable names

Hello,

I am trying to adapt code from the paper "A macro for reading multiple text files" by Debbie Miller to my own data.  The macro reads all the text files in a given folder, reads the variables and designates them as headers, then reads in all the data from all the text files into a single file. The result is a single SAS file containing all the headers and data contained in the target folder's text files. The issue I have is that some of the variable names in my text file's headers contain special characters (?, ', etc.). These cause the macro to fail. I would like to know if there is any way to make SAS read these special characters just as normal text (or, worst case, to delete them). So far I have tried the options validvarname=any; and compressing out the special characters, to no avail (although I assume I am just using them wrong). Here are the paper and code segments I believe to be relevant:

After the name of the next file to be read has been retrieved, the names of the variables are read from the first line of the text file. In this macro, it is necessary to know the maximum number of variables possible in any text file. Here the maximum number is 17. The variable names are then stored as variables X1-X17 in the data set VAR_NAMES. Since at this point I only want to read the names of the variables, which are located in the first line of the input text file, I specify the option OBS=1 in the INFILE statement. This option tells SAS the number of the last record in the input file that you want to read. The MISSOVER option prevents SAS from automatically going to the next line if it doesn't find values for all of the variables. If fewer than 17 variables are read, the remaining variables are set to missing. The resulting data set contains only 1 record, with 17 text variables that contain the names of the variables in the input file that will be read next.

%macro fileread;

%do j=1 %to &num_files;

data _null_;

set file_list;

if _n_=&j;

call symput ('filein',fname);

run;

data var_names;

length x1-x17 $12;

infile "d:\requests\miller\testfiles\&filein" obs=1 missover;

input (x1-x17) ($) ;

run;

A new macro called varnames that is nested inside the main macro, fileread, is then used to store the variable names to macro variables. The names are stored in macro variables V1 through V17. This is done by using a DATA _NULL_ step inside a do-loop that executes 17 times. A CALL SYMPUT statement is used to store the names to the macro variables. The GLOBAL statement is necessary so that the 17 macro variables will be available for use outside of the varnames macro.

%macro varnames;

%do i=1 %to 17;

%global v&i;

data _null_;

set var_names;

call symput("v&i",trim(x&i));

run;

%end;

%mend varnames;

%varnames;


Accepted Solutions
Solution
‎06-25-2013 03:31 PM
Contributor
Posts: 21

Re: Batch reading text files with special characters the variable names

I finally got it to work, but another way which I think is simpler (it only required the addition of 2 lines of code, provided by tech support): input@; and translate and compress functions to get rid of/replace special characters. Here is the part of the code that changed (in the first section of the macro fileread)

data var_names;

length x1-x145 $30;

infile "C:\Users\pignon2\Dropbox\2013_chamber_experiment\05.29.13\&filein" dsd dlm='09'X firstobs=17 obs=17 missover lrecl=32767;

input @;

_infile_=translate(_infile_,'_','''');

_infile_=compress(_infile_," @%/.'?""");

input (x1-x145) ($) ;

run;

View solution in original post


All Replies
Super User
Super User
Posts: 7,078

Re: Batch reading text files with special characters the variable names

Posted in reply to charles_pignon1

Why are you using macro variables at all?  Why would you put the variable names into macro variables if you already have them in a dataset?

Also look at the FILEVAR option on the INFILE statement so that you can also eliminate use macro variables to pass the file names.

Contributor
Posts: 21

Re: Batch reading text files with special characters the variable names

Thanks for the quick response! So, the variables are stored as macro variables and then combined with the data from the text files. This then puts the variable names as headers, under which the corresponding values are then stored.

This isn't even the issue at this point, however: I can't put the variable names into a dataset (var_names) because they contain special characters that SAS won't read. I tested the program by creating some test .txt files with only alphanumeric characters; it worked fine, but adding a single ' or ? character to one of the variables in the .txt files causes the macro to fail. It does not create the dataset var_names or perform any of the subsequent operations.

Super User
Super User
Posts: 7,078

Re: Batch reading text files with special characters the variable names

Posted in reply to charles_pignon1

Sound more like you have a character encoding issue that is preventing you from reading the source file.

If you want to use the strings as variable NAMES then they will have to match SAS naming conventions for variable names. Names should consist only of letters, digits and underscore and cannot start with a digit.  Perhaps you should use the values as LABELS for your variables instead of NAMES?

You can circumvent some of this by setting the option VALIDVARNAME to ANY. Then you can create name literals from your character variables.  Here is an example:

options validvarname=any;

data x;

  "This is 'nuts', no?"n = 10;

run;

So here is a data step to create name literals and store them in global macro variables V1 to Vxxx from dataset variables X1 to Xxxx.  Note that you do not need macro logic to do this.

data _null_;

  set var_names;

   array names X: ;

   do i=1 to dim(names);

      call symputx( cats('V',i) , quote(trim(names(i)))||'n' , 'G' ) ;

  end;

run;



Contributor
Posts: 21

Re: Batch reading text files with special characters the variable names

I had already tried the validvarname=any, inserting it like this:

options validvarname=any;

data var_names;

length x1-x17 $12;

infile "d:\requests\miller\testfiles\&filein" obs=1 missover;

input (x1-x17) ($) ;

run;

That didn't work. I think your proposed data_null_ step would work if I was manually entering the variable names, however what I am actually trying to do is retrieve them from the .txt file. So (and I could be wrong) I think that the problem is occurring either in the infile or in the input steps of the code I copied here. SAS never gets the chance to turn the data into variables,because it is not even able to read the data from the text file. I do think you're right therefore in that it's a character encoding issue, but I don't know how/where in the code to fix it.

Super User
Super User
Posts: 7,078

Re: Batch reading text files with special characters the variable names

Posted in reply to charles_pignon1

Read in the file and look at what characters are actually there.  Perhaps you are trying to read a file that is not text at all?

data _null_;

  infile ...... ;

  input;

  list;

  stop;

run;

Contributor
Posts: 21

Re: Batch reading text files with special characters the variable names

I created a text file specifically to test this, the first line has written, in tab delimited, a'?     b     c     d (etc.) and the second line is 1     2     3     4. Note the '? in the first "variable name": when I remove them from the text file, the macro works just fine, and outputs one file (var_name) that looks like this:

v1     v2     v3     v4

a     b     c     d

The next step of the macro reads and adds in the data, creating another file (data_all) that looks like this:

a     b     c     d

1     2     3     4

Where a b c d are the variable names and 1 2 3 4 is the data. However adding special characters to the first line in the text file makes it illegible to SAS

Super User
Super User
Posts: 7,078

Re: Batch reading text files with special characters the variable names

Posted in reply to charles_pignon1

Why not let SAS convert the names for you?

Read the data into a tall format with NAME/VALUE pairs and pass it through PROC TRANSPOSE.  SAS will convert the values to valid names. In your example the first variable becomes A__ where underscores have replaced the quote and question mark.

This one eliminates the 17 column assumption and instead assumes that the columns stop when there are no more names. So you cannot have a gap in the column names in the source file.

Again no need for macro variables or macro logic.

filename datafile "....";

data names ;

  infile datafile dsd dlm='09'x truncover obs=1;

  length name $32 ;

  do until(name=' ');

     input name @;

     if name^=' ' then output;

  end;

run;

data tall ;

  infile datafile dsd dlm='09'x truncover firstobs=2;

  row+1;

  do col=1 to ncols;

    set names point=col nobs=ncols;

    input value @;

    output;

  end;

run;

proc transpose data=tall out=wide;

  by row;

  var value;

  id name;

run;


Contributor
Posts: 21

Re: Batch reading text files with special characters the variable names

It almost works! This code did just what I wanted (even when special characters were in the text file), but only for one text file in the folder. Is there any way to now process all the text files in the folder and compile them into one dataset? I believe this was the purpose of the macro I was originally using.

Super User
Super User
Posts: 7,078

Re: Batch reading text files with special characters the variable names

Posted in reply to charles_pignon1

One way is to just embed it inside of whatever logic you already have for looping over the input files.

Typically I would convert the above to a macro that accepts the input filename and target SAS dataset name as parameters.

%macro onefile(infile=,ds=);

filename datafile "&infile";

...

proc transpose data=tall out=&ds ;

...

%mend onefile;


Then I would create a program to calls it once for each file.

For example it could be as simple as a data step that reads the output of a dir or ls command and writes the macro calls to a new file that can be %INCLUDED.

filename code temp;

data files ;

  infile 'ls *.txt' pipe truncover ;

  input filename $200.;

  dsname=scan(filename,-2,'/.\');

  file code;

  put '%onefile(infile=' filename ',ds=' dsname ')' ;

run;

%inc code / source2 ;

Solution
‎06-25-2013 03:31 PM
Contributor
Posts: 21

Re: Batch reading text files with special characters the variable names

I finally got it to work, but another way which I think is simpler (it only required the addition of 2 lines of code, provided by tech support): input@; and translate and compress functions to get rid of/replace special characters. Here is the part of the code that changed (in the first section of the macro fileread)

data var_names;

length x1-x145 $30;

infile "C:\Users\pignon2\Dropbox\2013_chamber_experiment\05.29.13\&filein" dsd dlm='09'X firstobs=17 obs=17 missover lrecl=32767;

input @;

_infile_=translate(_infile_,'_','''');

_infile_=compress(_infile_," @%/.'?""");

input (x1-x145) ($) ;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 3209 views
  • 4 likes
  • 2 in conversation