DATA Step, Macro, Functions and more

Open ZIP file to import batch of CSV files without external program

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Open ZIP file to import batch of CSV files without external program

I'm wondering if there's a way to uncompress a ZIP file so that SAS can dynamically import a batch of CSV files. I know the filename statement can unzip files, but I need to point to a folder of CSV files, because I need to use the file names for further processing. 

 

So far, I've gotten the following code to work on a folder in which all of the CSV files are already unzipped. Would love to back up this code one more step and start with a ZIP file. Using Base SAS 9.4. 

 

 /* 1. Get file names from folder*/

filename csvfiles pipe 'dir "filepath\*.csv" /b';

data csvlist;
	length csvname $100; 
        infile csvfiles length=reclen;
	input csvname $varying100. reclen;
run;

/* 2. Create filepaths for proc import*/

data _null_;
	set csvlist end=end;
	count+1;
	call symputx('path'||put(count,4.-l),cats('filepath\',csvname));
	if end then call symputx('max',count);
run;


/* 3. Import CSV files one at a time using list of file paths from 1&2*/

%macro importcsv;
	%do i=1 %to &max;
		proc import datafile="&&path&i"
			out=temp&i  
			dbms=csv
			replace;
			getnames=yes;
			datarow=2;
		run;
	%end;
%mend importcsv;

Accepted Solutions
Solution
3 weeks ago
Occasional Contributor
Posts: 16

Re: Open ZIP file to import batch of CSV files without external program

I've had success with extracting csv files from a .zip file using the macro contained within this paper. This also handles the zipping if you need to, but I only used it to unzip. It basically creates a .vbs script to handle all the unzipping. You could probably unzip all the csvs to a single path, then continue with your statements below. The instructions in the paper explain it better than I can here. 

View solution in original post


All Replies
Solution
3 weeks ago
Occasional Contributor
Posts: 16

Re: Open ZIP file to import batch of CSV files without external program

I've had success with extracting csv files from a .zip file using the macro contained within this paper. This also handles the zipping if you need to, but I only used it to unzip. It basically creates a .vbs script to handle all the unzipping. You could probably unzip all the csvs to a single path, then continue with your statements below. The instructions in the paper explain it better than I can here. 

Occasional Contributor
Posts: 8

Re: Open ZIP file to import batch of CSV files without external program

Posted in reply to bobpep212

Thanks! This macro ended up being the simplest solution for my problem, as now I have the flexibility to import the files from a local folder. 

Super User
Posts: 24,012

Re: Open ZIP file to import batch of CSV files without external program

You can read directly from the zip file, without unzipping it Smiley Happy

 

This is how you read it, note the filename statements:

https://kb.iu.edu/d/azva

 

And here's how you get the list of items in your zip file

https://blogs.sas.com/content/sasdummy/2016/10/16/filename-zip-list-file-contents/

 


@viola wrote:

I'm wondering if there's a way to uncompress a ZIP file so that SAS can dynamically import a batch of CSV files. I know the filename statement can unzip files, but I need to point to a folder of CSV files, because I need to use the file names for further processing. 

 

So far, I've gotten the following code to work on a folder in which all of the CSV files are already unzipped. Would love to back up this code one more step and start with a ZIP file. Using Base SAS 9.4. 

 

 /* 1. Get file names from folder*/

filename csvfiles pipe 'dir "filepath\*.csv" /b';

data csvlist;
	length csvname $100; 
        infile csvfiles length=reclen;
	input csvname $varying100. reclen;
run;

/* 2. Create filepaths for proc import*/

data _null_;
	set csvlist end=end;
	count+1;
	call symputx('path'||put(count,4.-l),cats('filepath\',csvname));
	if end then call symputx('max',count);
run;


/* 3. Import CSV files one at a time using list of file paths from 1&2*/

%macro importcsv;
	%do i=1 %to &max;
		proc import datafile="&&path&i"
			out=temp&i  
			dbms=csv
			replace;
			getnames=yes;
			datarow=2;
		run;
	%end;
%mend importcsv;

 

Super User
Super User
Posts: 8,279

Re: Open ZIP file to import batch of CSV files without external program

Do you really need to use PROC IMPORT?  Don't you know what format the files have?

Try this little example.

%let wdir=%sysfunc(pathname(work));

data _null_;
   file "&wdir/example.zip" zip member='file1.csv' dsd ;
   set sashelp.class (obs=5);
   put (_all_) (+0);
run;

data _null_;
   file "&wdir/example.zip" zip member='file2.csv' dsd ;
   set sashelp.class (firstobs=6 obs=10);
   put (_all_) (+0);
run;

data want ;
  if 0 then set sashelp.class ;
  infile "&wdir/example.zip" zip member='*' dsd truncover ;
  input (_all_) (+0);
run;

proc print;
run;
Occasional Contributor
Posts: 8

Re: Open ZIP file to import batch of CSV files without external program

I was wondering if I had any extra steps in here. The file extensions will always be the same, but the file names will change, and I don't want to have to look at what the names are each time I run this procedure. I'm not super familiar with what your proposed code is trying to achieve - will it let me read in names dynamically?

Super User
Posts: 24,012

Re: Open ZIP file to import batch of CSV files without external program

No, you do have to know the file names. The link I showed indicates how to determine the name in the files using code. This allows you to use those in your code, you're not expected to use it manually. You may end up with a small macro but it should be pretty straightforward. 

 


@viola wrote:

I was wondering if I had any extra steps in here. The file extensions will always be the same, but the file names will change, and I don't want to have to look at what the names are each time I run this procedure. I'm not super familiar with what your proposed code is trying to achieve - will it let me read in names dynamically?


 

 

Super User
Super User
Posts: 8,279

Re: Open ZIP file to import batch of CSV files without external program

[ Edited ]

The program is demonstrating that if your ZIP consists of multiple files that all share the same formal you do NOT need to know the names of the individual files to read them all with one data step without any macro logic or calls to procedures.

 

However if the ZIP files are not well defined.

1) They have mixed combination of files, some of which you want to read and others that you want to ignore.

2) The formats of the files (what columns they contain, what order the columns are in, what type of data the columns contain, ...) vary with in zip file.

3) perhaps other confounding factors

 

Then you would be better served to get the list of files and in the ZIP file and use that to drive your process.  You might be able to use DOPEN() and DREAD() functions to do that.

 

In particular if you have to use PROC IMPORT to guess about what is in the files then you will have to first copy the files out of the ZIP file.  At least as of the last time I tried it PROC IMPORT could NOT import from a member of a ZIP file.  Even if you use a SAS filename statement to point to a specific member of the ZIP file. 

Super User
Posts: 24,012

Re: Open ZIP file to import batch of CSV files without external program

Tom is trying to indicate that PROC IMPORT will cause you problems in this process. Instead of PROC IMPORT you should use a data step to read the CSV files. If the files are all supposed to be the same, I echo his concerns. PROC IMPORT guesses at variable types and lengths and not always correctly. You can try and get around this with GUESSINGROWS=MAX option but it slows the process down a lot, and still not a guarantee. If the files are different enough and you try and use them together you'll end up with issues because one variable will be character in one data set and numeric in another. And then your next question is how to clean your data. Seriously, having done this many times, make as much effort as possible to get the data read in correctly and cleanly. You save yourself time downstream.
Occasional Contributor
Posts: 8

Re: Open ZIP file to import batch of CSV files without external program

Thanks @Reeza and @Tom for your explanations. I'm going to have to think this through a little more. I have about 60 files to read in, and each one is unique. However, once imported I do not have to combine them - they will stay as separate data sets, but it will be important to make sure that the formats are correct in the end. I hear what you're saying about PROC Import being imprecise and am reconsidering its use. Ultimately, I will need a solution that does not require my hard coding each of the variables - I do have a spec sheet that may be able to solve this problem. 

Respected Advisor
Posts: 4,797

Re: Open ZIP file to import batch of CSV files without external program

@viola

The following link should give you all you need.

https://blogs.sas.com/content/sasdummy/2014/01/29/using-filename-zip/  

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 130 views
  • 6 likes
  • 5 in conversation