- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am using the following code to read multiple csvs.
http://www.ats.ucla.edu/stat/sas/faq/multi_file_read.htm
But, my csvs have blank rows in them. Any suggestions on how to modify this code to prevent reading the blank rows. The blank rows are making it take a long time to run.
I am currently doing it using a second data step with this line of code
if compress(cats(of _all_),'.')=' ' then delete; /*This line was also taken from SAS communites. Thanks!*/
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When using the INPUT statement the automatic variable _INFILE_ gets created. This variable points to the input buffer.
For an blank line in your source .csv the input buffer will be "blank" and the following code should work.
data dirlist;
length fname $256;
infile dirlist length=reclen;
input fname $varying256. reclen;
if missing(_INFILE_) then delete;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It could make a difference depending on the file system and structure of the CSV files themselves, but you could try something like this:
if reclen < 2 then
delete;
If you've used the length=reclen keyword of infile, and your files have variable length records, the blank lines will have a length of 0 or 1 (probably the former).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
LaurieF,
Where would I put this piece of code in the code in the link?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's hard to tell, because I can't see your exact code!
If it's exactly as the link has it, I'd modify it to this:
do while(not done);
myfilename = filepath;
input @;
if reclen >= 2 then do;
input name $ x1 x2 x3 @;
output;
end;
input;
end;
The input @; will read the record but not process it and not release it; nevertheless it populates reclen. If the length is 2 or more, read the fields you want and write out an observation. Again the trailing @ will not release the record. At the end of the do block, the record is again processed as a null process, but releases it so that the next input will carry on correctly.
I should think that this will give you quite an improvement, especially if you've got a lot of variables to read.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When using the INPUT statement the automatic variable _INFILE_ gets created. This variable points to the input buffer.
For an blank line in your source .csv the input buffer will be "blank" and the following code should work.
data dirlist;
length fname $256;
infile dirlist length=reclen;
input fname $varying256. reclen;
if missing(_INFILE_) then delete;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you mean _INFILE_, not _input_
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I do and fixed in my post. Thanks!