BookmarkSubscribeRSS Feed
Anotherdream
Quartz | Level 8

Does anyone know a way to read in a csv file that has an aprostophe within its title? I need this to work with a macro variable because these files are within a loop that reads in hundres of other files.

Please see the attached code for an example of how I would like to read in a csv file named "O'leary.csv".

Data person;

input name $;

datalines;

O'leary.csv

run;

data _null_;

set person;

if _n_=1 then call symputx('macro_name',name);

run;

proc import datafile="c:\MYDRIVE\&Macro_name." out=newdata replace;

run;

Thanks again
Brandon

13 REPLIES 13
ballardw
Super User

First you need to make sure that the length of NAME in the data step is long enough to read the name. By default SAS defaults to 8 characters max when reading character variables.

Either preface the input with: LENGTH NAME $ 11 ; /* or larger*/

or

input name $ 1-11; /* read explicitly from columns 1 to 11/*

Did:

proc import datafile="C:\mydrive\O'leary.csv" out = newdata replace; run;

not work?


Anotherdream
Quartz | Level 8

Hello ballard. SOrry this was an example, and in my actual example the length of my name is well over 200 to account for all file names (up to 200 obviously, but that will never happen).

And for your second question, no.

DOES NOT WORK

proc import datafile="C:\mydrive\O'leary.csv" out = newdata replace; run;

I recieve the ERROR: Option "leary" is not known for infile.

WARNING: End-of-file encounted in string.

This is the error I am struggling with.

Thanks!

ballardw
Super User

With SAS 9.2.3 and Windows7 I don't receive that error with

proc import datafile="C:\mydrive\O'leary.csv" out = newdata replace; run;

and it successfully imports the trial data I created.

What does your output look like if you do

%put ="c:\MYDRIVE\&Macro_name.";

If there are lots of blanks after the second slash I would try

if _n_=1 then call symputx('macro_name',strip(name));

Anotherdream
Quartz | Level 8

This might be an issue with the version of windows / sas that I am running, because the code you give above does throw an error for me.

However the place I work is still on windows Xp, so that could be a leading contributor.

proc import datafile="C:\mydrive\O'leary.csv"

Thanks!

ballardw
Super User

Have you tried a filename assignment and using a fileref?

Filename InData "C:\mydrive\O'leary.csv";

and proc import datafile=InData ....

The filename might give better diagnostics if there is an actual problem with the filename.

The end of file message looks almost like it is trying to read a pipe or something instead of a literal string.

Tom
Super User Tom
Super User

The problem appears to be in the SCL code that PROC IMPORT is calling to generate the DATA step to read you delimited file.

1) Don't use PROC IMPORT.  Instead just write the data step yourself.

2) As others have suggested create a FILEREF and use that in the datafile option of PROC IMPORT.  This avoids the SCL code bug.  You do not need to create the macro variable as you can just create the FILENAME using the FILENAME() function.

data _null_;

  set person;

   if filename('CSVFILE',trim(name)) then put "ERROR: Unable to assign FILENAME";

run;

proc import datafile=CSVFILE out=newdata dbms=dlm replace;

  delimiter=',';

run;

Peter_C
Rhodochrosite | Level 12

Tom has offered the approach that I would normally suggest.

For one-off data investigation proc import might be suitable but in cases like this, where many files must be loaded, surely you will have a way of defining the common layout, or even a small set of layouts. For a single layout common to all CSV files, you don't need hundreds of proc import, JUST ONE DATA STEP.

However, I have grown accustomed to the irrational decision that MUST be implemented so here is another way to accommodate the SCL bug which Tom has identified.

Just rename all files which contain a single quote mark - something like the dos command

RENAME *'* *#*

you might get it to work but I couldn't  - so as normal, I revert to a SAS DATA STEP

OPTION NOXWAIT XSYNC ;

%let path = path of folder containing the CSV files ;

%let path = %sysget(userprofile)\documents\my sas files; ***** just for my testing ;

data _null_ ;

   infile "dir  ""&path\*'*"" /b " device= pipe ; * scanover ;

   input  ;

   length newname $1024 ;

   newname =translate( _infile_, "##", "'""" ) ;

* converting both single and double quotes to # ;

   call system( "rename " !! _infile_ !!"="!! newname ) ;

   put "original=" _INFILE_ +2 NEWNAME= ;

run;

and that did work, because all files containing a ' mark now have their names changed replacing that ' with a #

Those who really need to overcome quotes in a name for proc import have this way to eliminate the problem.

Hopefully there are no other defects of this sort in the SCL code invoked by proc import of CSV files. Just in case double quotes in a name might also cause trouble, that short datastep would replace those too.

hope this helps someone

peterC

Tom
Super User Tom
Super User

Or just use the list of files to generate the code.  Here is example to import each CSV file into a separate dataset.

For unix change the command used in the INFILE statement.

%let path = path of folder containing the CSV files ;

filename code temp;

data csvfiles ;

   file code;

   infile "dir ""&path\*.csv"" /b" pipe truncover lrecl=256 ;

   n+1;

   length datafile out $32 ;

   input filename $256.;

   datafile=cats('FILE',n);

   out=cats('DATA',n);

   put 'filename ' datafile filename :$quote. ';'

     / 'proc import ' datafile= out= 'dbms=dlm replace ;'

     / '  delimiter='','';'

     / 'run;'

   ;

run;

%inc code / source2 ;

Anotherdream
Quartz | Level 8

Hiya Peter. And each csv within the process do not have a common layout, The only thing that can be assumed is that the column names are within the first row. However the order, number of columns, etc.. are not consistent. This is why I am using the proc import, to read in the csv header names and determine what order they are in, what is missing, what is extra, etc...

I guess I could theoritically read in the file using a single line input, and then seperate the names using commas (similar to what Tom has helped me with in another piece of code).  I will attempt that later. if anyone has any other method to get only the headers of a csv, without knowing what they are and given there could be anywhere between 2 and 210, please let me know!

In addition when I get back to work, I will try both of the codes given to see if renaming the ' will help.

Thanks!

Tom
Super User Tom
Super User

If you just want to read the header lines to find which files have which names you can do that in one data step.

Here is an example.

data csvfile_columns;

  infile 'ls c*.csv' pipe truncover;

  input filename $200.;

  filevar=filename;

  infile csvfile dsd filevar=filevar truncover obs=1;

  length column 8 name $200 ;

  do column = 1 by 1 until (name=' ');

    input name @;

    if name ne ' ' then output;

  end;

run;

Peter_C
Rhodochrosite | Level 12

Here is a routine to convert your csv files into one SAS dataset, with all columns loaded as $30. strings and each row holds the name of the file from which it is read

%let your_path=c:\temp ;

data all_of _them( compress=yes) ;

length filen filename $100 col1-col210 $30 ;

infile "&your_path\*.csv" truncover dsd lrecl=10000 filename= filen

     eov=eov /* indicate end-of each file*/ ;

do row=1 by 1 until( eov) ;

   input col1-col210 ;

   filename= filen ;

   output ;

end ;

call missing(eov) ;

run ;

Anotherdream
Quartz | Level 8

Hello Tom. I am not sure I understand your answer.

What I'd like to do is take a directory with say 500 Csv files, and read in each file 1 by 1, however for each file, only read in the very first row of data to determine what the column headers are.  WHen I run the code above that is not what seems to occur (however I do not fully understand what you are doing in the code above, sorry).

Again Tom, thanks very much for all your help. I'm sorry I am not fully able to follow your answer in this example.

P.S. note, apparently while proc import does not work by default, the Filename specifiation does seem to work. As that was my original question, I will be marking that as the correct answer, however Tom if you have another more efficient way or reading in a files first row values only, I would greatly love to see it (and hopefully I can understand it).  If you'd like, I can start a new thread that way I can credit you with the correct answer for this related problem. 😃

Brandon

Tom
Super User Tom
Super User

This program will read the first line of each file.  It outputs the result into a vertical structure with the name of the CSV file, the column number and the name of the column. NOTE: If you are running on Windows then change the ls command to dir /b command. Play with the command from the command line until it generates the list in format you want. If you want to use a macro variable reference in the command then use double quote characters (") instead of single quote characters ('). 


Let me add some comments.


* Make a dataset ;

data csvfile_columns;

   * Read output of DIR or ls command to get list of files to process into variable FILENAME;

  infile 'ls *.csv' pipe truncover;

  input filename $200.;

   * Create copy of FILENAME to use with the FILEVAR option of the INFILE statement ;

   * This is needed because SAS automatically drops the variable named with the FILEVAR option ;

  filevar=filename;

   * Point to the file named in FILEVAR using the dummy fileref of CSVFILE.  ;

   * Use the DSD option to indicate the values are separated by commas. ;

   * use the TRUNCOVER option to stop it from going to the next line. ;

   * Use the OBS=1 option so that only the first line is read ;

  infile csvfile dsd filevar=filevar truncover obs=1;

   * Define attributes for COLUMN and NAME variables ;

  length column 8 name $200 ;

   * Loop until you see an empty column name ;

  do column = 1 by 1 until (name=' ');

     * Read the column name and keep the line open ;

    input name @;

     * Output put an observation when the name is not empty ;

    if name ne ' ' then output;

  end;

run;


If for some reason you have columns without names you could iterate the loop based on the number of commas in the line.


  input @;

  do column = 1 to countw(_infile_,',','Q');

    * Read the column name and keep the line open ;

    input name @;

    * Output put an observation  ;

    output;

  end;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 3870 views
  • 0 likes
  • 4 in conversation