I'm wondering if there's a way to uncompress a ZIP file so that SAS can dynamically import a batch of CSV files. I know the filename statement can unzip files, but I need to point to a folder of CSV files, because I need to use the file names for further processing.
So far, I've gotten the following code to work on a folder in which all of the CSV files are already unzipped. Would love to back up this code one more step and start with a ZIP file. Using Base SAS 9.4.
/* 1. Get file names from folder*/
filename csvfiles pipe 'dir "filepath\*.csv" /b';
data csvlist;
length csvname $100;
infile csvfiles length=reclen;
input csvname $varying100. reclen;
run;
/* 2. Create filepaths for proc import*/
data _null_;
set csvlist end=end;
count+1;
call symputx('path'||put(count,4.-l),cats('filepath\',csvname));
if end then call symputx('max',count);
run;
/* 3. Import CSV files one at a time using list of file paths from 1&2*/
%macro importcsv;
%do i=1 %to &max;
proc import datafile="&&path&i"
out=temp&i
dbms=csv
replace;
getnames=yes;
datarow=2;
run;
%end;
%mend importcsv;
I've had success with extracting csv files from a .zip file using the macro contained within this paper. This also handles the zipping if you need to, but I only used it to unzip. It basically creates a .vbs script to handle all the unzipping. You could probably unzip all the csvs to a single path, then continue with your statements below. The instructions in the paper explain it better than I can here.
I've had success with extracting csv files from a .zip file using the macro contained within this paper. This also handles the zipping if you need to, but I only used it to unzip. It basically creates a .vbs script to handle all the unzipping. You could probably unzip all the csvs to a single path, then continue with your statements below. The instructions in the paper explain it better than I can here.
Thanks! This macro ended up being the simplest solution for my problem, as now I have the flexibility to import the files from a local folder.
Hi Bob, I really like the functionality contained in this macro. As a relatively new user, and still on SAS Studio, I'm having a hard time getting this to work. I amended the first line to suit my needs:
%macro SASZip_Lite(zip=/folders/myfolders/source_folder/demo_20170101.zip, tfdr=/folders/myfolders/DEMO_Q1_2017_CSV);
Do you have any suggestions as to why this isn't working? I've tried it with/without quotes, with/without the other terms included(sfdr fstyl, etc)
Thank you.
You can read directly from the zip file, without unzipping it 🙂
This is how you read it, note the filename statements:
And here's how you get the list of items in your zip file
https://blogs.sas.com/content/sasdummy/2016/10/16/filename-zip-list-file-contents/
@viola wrote:
I'm wondering if there's a way to uncompress a ZIP file so that SAS can dynamically import a batch of CSV files. I know the filename statement can unzip files, but I need to point to a folder of CSV files, because I need to use the file names for further processing.
So far, I've gotten the following code to work on a folder in which all of the CSV files are already unzipped. Would love to back up this code one more step and start with a ZIP file. Using Base SAS 9.4.
/* 1. Get file names from folder*/ filename csvfiles pipe 'dir "filepath\*.csv" /b'; data csvlist; length csvname $100; infile csvfiles length=reclen; input csvname $varying100. reclen; run; /* 2. Create filepaths for proc import*/ data _null_; set csvlist end=end; count+1; call symputx('path'||put(count,4.-l),cats('filepath\',csvname)); if end then call symputx('max',count); run; /* 3. Import CSV files one at a time using list of file paths from 1&2*/ %macro importcsv; %do i=1 %to &max; proc import datafile="&&path&i" out=temp&i dbms=csv replace; getnames=yes; datarow=2; run; %end; %mend importcsv;
Do you really need to use PROC IMPORT? Don't you know what format the files have?
Try this little example.
%let wdir=%sysfunc(pathname(work));
data _null_;
file "&wdir/example.zip" zip member='file1.csv' dsd ;
set sashelp.class (obs=5);
put (_all_) (+0);
run;
data _null_;
file "&wdir/example.zip" zip member='file2.csv' dsd ;
set sashelp.class (firstobs=6 obs=10);
put (_all_) (+0);
run;
data want ;
if 0 then set sashelp.class ;
infile "&wdir/example.zip" zip member='*' dsd truncover ;
input (_all_) (+0);
run;
proc print;
run;
I was wondering if I had any extra steps in here. The file extensions will always be the same, but the file names will change, and I don't want to have to look at what the names are each time I run this procedure. I'm not super familiar with what your proposed code is trying to achieve - will it let me read in names dynamically?
No, you do have to know the file names. The link I showed indicates how to determine the name in the files using code. This allows you to use those in your code, you're not expected to use it manually. You may end up with a small macro but it should be pretty straightforward.
@viola wrote:
I was wondering if I had any extra steps in here. The file extensions will always be the same, but the file names will change, and I don't want to have to look at what the names are each time I run this procedure. I'm not super familiar with what your proposed code is trying to achieve - will it let me read in names dynamically?
The program is demonstrating that if your ZIP consists of multiple files that all share the same formal you do NOT need to know the names of the individual files to read them all with one data step without any macro logic or calls to procedures.
However if the ZIP files are not well defined.
1) They have mixed combination of files, some of which you want to read and others that you want to ignore.
2) The formats of the files (what columns they contain, what order the columns are in, what type of data the columns contain, ...) vary with in zip file.
3) perhaps other confounding factors
Then you would be better served to get the list of files and in the ZIP file and use that to drive your process. You might be able to use DOPEN() and DREAD() functions to do that.
In particular if you have to use PROC IMPORT to guess about what is in the files then you will have to first copy the files out of the ZIP file. At least as of the last time I tried it PROC IMPORT could NOT import from a member of a ZIP file. Even if you use a SAS filename statement to point to a specific member of the ZIP file.
Thanks @Reeza and @Tom for your explanations. I'm going to have to think this through a little more. I have about 60 files to read in, and each one is unique. However, once imported I do not have to combine them - they will stay as separate data sets, but it will be important to make sure that the formats are correct in the end. I hear what you're saying about PROC Import being imprecise and am reconsidering its use. Ultimately, I will need a solution that does not require my hard coding each of the variables - I do have a spec sheet that may be able to solve this problem.
The following link should give you all you need.
https://blogs.sas.com/content/sasdummy/2014/01/29/using-filename-zip/
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.