- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I got hundreds of file need to unzip. There are two major problems;
1, input: right now, 'input;' does not work, only work when you specify the variables and length and format, this causes lots of trouble, as we have to re-define all the variables and format. Especially when dealing with different table, it's a disaster.
I hope to have a simple command to universally treat input;
2, Any way to unzip a batch of files? Got hundreds, batch by batch and I don't want to do it one by one.
Thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So you don't have ZIP files at all? Just GZIP files? If so that makes it much easier to read all of the files in one data step. First get a list of the files into a dataset. For example by reading the output of dir command (or ls or find on unix) command.
data files;
infile 'dir C:\Logs\SEGuide_log.*.txt.gz /b' pipe truncover;
input filename $256. ;
run;
Now use that list to drive the data step that reads the files.
data logdata;
set files;
fname=filename ;
infile logfile zip gzip filevar=fname end=eof;
do while (not eof);
input date : yymmdd10. timestamp : anydttme. ;
output;
end;
format date date9. timestamp timeampm.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
#2 Have you considered reading the files straight from the ZIP file?
That process is illustrated here https://blogs.sas.com/content/sasdummy/2015/05/11/using-filename-zip-to-unzip-and-read-data-files-in...
Then you can turn that into a macro to process all your ZIP files.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
filename fromzip ZIP "C:\Logs\SEGuide_log.10168.txt.gz" GZIP; data logdata; infile fromzip; /* read directly from compressed file */ input date : yymmdd10. time : anydttme. ; format date date9. time timeampm.; run;
Hi Reeza, thanks for you advice.
1, the above code is is similar to what I used. I will try proc import.
2, I thought about it, if there is no better choice, macro is the only way I guess.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So you don't have ZIP files at all? Just GZIP files? If so that makes it much easier to read all of the files in one data step. First get a list of the files into a dataset. For example by reading the output of dir command (or ls or find on unix) command.
data files;
infile 'dir C:\Logs\SEGuide_log.*.txt.gz /b' pipe truncover;
input filename $256. ;
run;
Now use that list to drive the data step that reads the files.
data logdata;
set files;
fname=filename ;
infile logfile zip gzip filevar=fname end=eof;
do while (not eof);
input date : yymmdd10. timestamp : anydttme. ;
output;
end;
format date date9. timestamp timeampm.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your advice Tom. yeah, not zip file, but .gz file.
First step indeed generate a list of files I want.
The problem is second step.
1, All the data store in 'DATA' folder, which I only have access to read, not write;
2, No matter I create a folder and use libname , or just under sas work file. The error message is always:
ERROR: Open failure for C:\WINDOWS\SYSTEM32\filename during attempt to create a local file handle.
so I am not sure if this is a path problem (which does not link to orginial 'DATA' folder) or it's a unzip problem. thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please show the actual code run. And SAS log if possible.
The only reason that data step would even attempt to write somewhere would be if you messed up the DATA statement. Just use the exact code I used:
data want;
and it will create a dataset named WORK.WANT.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, it's under Safe Environment, so I could not copy anything out.
Regardless data want; or data libname.want;
the error message is the same as I wrote in last reply.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So do it step by step.
1) Make sure you can create a dataset.
data test1;
x=1;
run;
2) Make sure you can read one of the gzip files. Like your original example.
data test2;
infile '.....txt.gz' zip gzip ;
input ....;
run;
3) Now try to read it using FILEVAR= option.
data test3;
fname='.....txt.gz';
infile dummy zip gzip filevar=fname;
input ....;
run;
4) Make a dataset with one filename and try reading it using that.
data test4;
filename='.....txt.gz';
run;
data test5;
set test4;
fname=filename;
infile dummy zip gzip filevar=fname end=eof;
do while (not eof);
input ....;
output;
end;
run;
Now replace TEST4 dataset with a larger list of filenames and then re-run the step to use it to create TEST5.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much Tom.
I tested run your (step by step) code below as well. I found that the step-by-step codes give me file name as full name with 'path'.
However, in this solution code, the first step data all the filenames, but the data removed all the path to these file.
so in second step, I added the path as
fname= '&path\'||filename. Everything works perfectly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One of the "features" of the Window/DOS command.