BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
viola
Obsidian | Level 7

I'm wondering if there's a way to uncompress a ZIP file so that SAS can dynamically import a batch of CSV files. I know the filename statement can unzip files, but I need to point to a folder of CSV files, because I need to use the file names for further processing. 

 

So far, I've gotten the following code to work on a folder in which all of the CSV files are already unzipped. Would love to back up this code one more step and start with a ZIP file. Using Base SAS 9.4. 

 

 /* 1. Get file names from folder*/

filename csvfiles pipe 'dir "filepath\*.csv" /b';

data csvlist;
	length csvname $100; 
        infile csvfiles length=reclen;
	input csvname $varying100. reclen;
run;

/* 2. Create filepaths for proc import*/

data _null_;
	set csvlist end=end;
	count+1;
	call symputx('path'||put(count,4.-l),cats('filepath\',csvname));
	if end then call symputx('max',count);
run;


/* 3. Import CSV files one at a time using list of file paths from 1&2*/

%macro importcsv;
	%do i=1 %to &max;
		proc import datafile="&&path&i"
			out=temp&i  
			dbms=csv
			replace;
			getnames=yes;
			datarow=2;
		run;
	%end;
%mend importcsv;
1 ACCEPTED SOLUTION

Accepted Solutions
bobpep212
Quartz | Level 8

I've had success with extracting csv files from a .zip file using the macro contained within this paper. This also handles the zipping if you need to, but I only used it to unzip. It basically creates a .vbs script to handle all the unzipping. You could probably unzip all the csvs to a single path, then continue with your statements below. The instructions in the paper explain it better than I can here. 

View solution in original post

12 REPLIES 12
bobpep212
Quartz | Level 8

I've had success with extracting csv files from a .zip file using the macro contained within this paper. This also handles the zipping if you need to, but I only used it to unzip. It basically creates a .vbs script to handle all the unzipping. You could probably unzip all the csvs to a single path, then continue with your statements below. The instructions in the paper explain it better than I can here. 

viola
Obsidian | Level 7

Thanks! This macro ended up being the simplest solution for my problem, as now I have the flexibility to import the files from a local folder. 

JMCass
Calcite | Level 5

Hi Bob, I really like the functionality contained in this macro.  As a relatively new user, and still on SAS Studio, I'm having a hard time getting this to work.  I amended the first line to suit my needs:

 

%macro SASZip_Lite(zip=/folders/myfolders/source_folder/demo_20170101.zip, tfdr=/folders/myfolders/DEMO_Q1_2017_CSV);

 

Do you have any suggestions as to why this isn't working?  I've tried it with/without quotes, with/without the other terms included(sfdr fstyl, etc)

 

Thank you.

Reeza
Super User
To run a macro you first run the macro definition code and then you call it separately.

ie

pretend macro:

%macro print(dsn=);
proc print data=&dsn.;
run;

%mend;

THen to call it you use:

%print(sashelp.class);

I think you're mixing up the macro and the macro call.
Reeza
Super User

You can read directly from the zip file, without unzipping it 🙂

 

This is how you read it, note the filename statements:

https://kb.iu.edu/d/azva

 

And here's how you get the list of items in your zip file

https://blogs.sas.com/content/sasdummy/2016/10/16/filename-zip-list-file-contents/

 


@viola wrote:

I'm wondering if there's a way to uncompress a ZIP file so that SAS can dynamically import a batch of CSV files. I know the filename statement can unzip files, but I need to point to a folder of CSV files, because I need to use the file names for further processing. 

 

So far, I've gotten the following code to work on a folder in which all of the CSV files are already unzipped. Would love to back up this code one more step and start with a ZIP file. Using Base SAS 9.4. 

 

 /* 1. Get file names from folder*/

filename csvfiles pipe 'dir "filepath\*.csv" /b';

data csvlist;
	length csvname $100; 
        infile csvfiles length=reclen;
	input csvname $varying100. reclen;
run;

/* 2. Create filepaths for proc import*/

data _null_;
	set csvlist end=end;
	count+1;
	call symputx('path'||put(count,4.-l),cats('filepath\',csvname));
	if end then call symputx('max',count);
run;


/* 3. Import CSV files one at a time using list of file paths from 1&2*/

%macro importcsv;
	%do i=1 %to &max;
		proc import datafile="&&path&i"
			out=temp&i  
			dbms=csv
			replace;
			getnames=yes;
			datarow=2;
		run;
	%end;
%mend importcsv;

 

Tom
Super User Tom
Super User

Do you really need to use PROC IMPORT?  Don't you know what format the files have?

Try this little example.

%let wdir=%sysfunc(pathname(work));

data _null_;
   file "&wdir/example.zip" zip member='file1.csv' dsd ;
   set sashelp.class (obs=5);
   put (_all_) (+0);
run;

data _null_;
   file "&wdir/example.zip" zip member='file2.csv' dsd ;
   set sashelp.class (firstobs=6 obs=10);
   put (_all_) (+0);
run;

data want ;
  if 0 then set sashelp.class ;
  infile "&wdir/example.zip" zip member='*' dsd truncover ;
  input (_all_) (+0);
run;

proc print;
run;
viola
Obsidian | Level 7

I was wondering if I had any extra steps in here. The file extensions will always be the same, but the file names will change, and I don't want to have to look at what the names are each time I run this procedure. I'm not super familiar with what your proposed code is trying to achieve - will it let me read in names dynamically?

Reeza
Super User

No, you do have to know the file names. The link I showed indicates how to determine the name in the files using code. This allows you to use those in your code, you're not expected to use it manually. You may end up with a small macro but it should be pretty straightforward. 

 


@viola wrote:

I was wondering if I had any extra steps in here. The file extensions will always be the same, but the file names will change, and I don't want to have to look at what the names are each time I run this procedure. I'm not super familiar with what your proposed code is trying to achieve - will it let me read in names dynamically?


 

 

Tom
Super User Tom
Super User

The program is demonstrating that if your ZIP consists of multiple files that all share the same formal you do NOT need to know the names of the individual files to read them all with one data step without any macro logic or calls to procedures.

 

However if the ZIP files are not well defined.

1) They have mixed combination of files, some of which you want to read and others that you want to ignore.

2) The formats of the files (what columns they contain, what order the columns are in, what type of data the columns contain, ...) vary with in zip file.

3) perhaps other confounding factors

 

Then you would be better served to get the list of files and in the ZIP file and use that to drive your process.  You might be able to use DOPEN() and DREAD() functions to do that.

 

In particular if you have to use PROC IMPORT to guess about what is in the files then you will have to first copy the files out of the ZIP file.  At least as of the last time I tried it PROC IMPORT could NOT import from a member of a ZIP file.  Even if you use a SAS filename statement to point to a specific member of the ZIP file. 

Reeza
Super User
Tom is trying to indicate that PROC IMPORT will cause you problems in this process. Instead of PROC IMPORT you should use a data step to read the CSV files. If the files are all supposed to be the same, I echo his concerns. PROC IMPORT guesses at variable types and lengths and not always correctly. You can try and get around this with GUESSINGROWS=MAX option but it slows the process down a lot, and still not a guarantee. If the files are different enough and you try and use them together you'll end up with issues because one variable will be character in one data set and numeric in another. And then your next question is how to clean your data. Seriously, having done this many times, make as much effort as possible to get the data read in correctly and cleanly. You save yourself time downstream.
viola
Obsidian | Level 7

Thanks @Reeza and @Tom for your explanations. I'm going to have to think this through a little more. I have about 60 files to read in, and each one is unique. However, once imported I do not have to combine them - they will stay as separate data sets, but it will be important to make sure that the formats are correct in the end. I hear what you're saying about PROC Import being imprecise and am reconsidering its use. Ultimately, I will need a solution that does not require my hard coding each of the variables - I do have a spec sheet that may be able to solve this problem. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 4990 views
  • 7 likes
  • 6 in conversation