BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
proctice
Quartz | Level 8

I am using the following code to read multiple csvs.

http://www.ats.ucla.edu/stat/sas/faq/multi_file_read.htm

But, my csvs have blank rows in them.  Any suggestions on how to modify this code to prevent reading the blank rows. The blank rows are making it take a long time to run. 

 I am currently doing it using a second data step with this line of code

if compress(cats(of _all_),'.')=' ' then delete; /*This line was also taken from SAS communites.  Thanks!*/

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

When using the INPUT statement the automatic variable _INFILE_ gets created. This variable points to the input buffer.

For an blank line in your source .csv the input buffer will be "blank" and the following code should work.

data dirlist;
  length fname $256;
  infile dirlist length=reclen;
  input fname $varying256. reclen;
  if missing(_INFILE_) then delete;
run;

 

View solution in original post

6 REPLIES 6
LaurieF
Barite | Level 11

It could make a difference depending on the file system and structure of the CSV files themselves, but you could try something like this:

if reclen < 2 then
   delete;

If you've used the length=reclen keyword of infile, and your files have variable length records, the blank lines will have a length of 0 or 1 (probably the former).

proctice
Quartz | Level 8

LaurieF,

 

Where would I put this piece of code in the code in the link?

LaurieF
Barite | Level 11

It's hard to tell, because I can't see your exact code!

 

If it's exactly as the link has it, I'd modify it to this:

 

 

do while(not done);
    myfilename = filepath;
    input @;
    if reclen >= 2 then do;
       input name $  x1 x2 x3 @;
       output;
       end;
    input;
  end;

 

The input @; will read the record but not process it and not release it; nevertheless it populates reclen. If the length is 2 or more, read the fields you want and write out an observation. Again the trailing  @ will not release the record. At the end of the do block, the record is again processed as a null process, but releases it so that the next input will carry on correctly.

 

I should think that this will give you quite an improvement, especially if you've got a lot of variables to read.

Patrick
Opal | Level 21

When using the INPUT statement the automatic variable _INFILE_ gets created. This variable points to the input buffer.

For an blank line in your source .csv the input buffer will be "blank" and the following code should work.

data dirlist;
  length fname $256;
  infile dirlist length=reclen;
  input fname $varying256. reclen;
  if missing(_INFILE_) then delete;
run;

 

mkeintz
PROC Star

I think you mean _INFILE_, not _input_

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

I do and fixed in my post. Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 4090 views
  • 3 likes
  • 4 in conversation