SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How do I read in multiple zipped text files at one time?

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

How do I read in multiple zipped text files at one time?

I am new to SAS (Stata and R user), but I have a very large dataset with SAS input statements provided so I want to use it at least to prepare and clean the data. I'm using SAS 9.4. 

 

I have a group of zipped text files that are of identical format but split into different files and zipped separately. I'm looking for an efficient way to read them all in together just stacked on top of one another so I can apply a set of input statements provided for them. 

 

Ex. of file naming structure

dme2010.file01.txt.gz

dme2010.file02.txt.gz

dme2010.file03.txt.gz

dme2011.file01.txt.gz

dme2011.file02.txt.gz

dme2011.file03.txt.gz

 

Each zip file contains just a single text file of the same name (ie dme2010.file01.txt.gz contains just dme2010.file01.txt). I am not able to extract the text files because they are read-only and this can't be changed.

 

I have tried using wildcards like below, but this just runs the input statements and doesn't grab any of the observations:

filename dmein pipe 'gzip -dc H:\Data\Seerm\Drive1\dme*.txt.gz';
filename dmeinput 'input lines';

data dme;
  infile dmein lrecl=635 missover pad;
  %include dmeinput;
run;

 

I have about 250 different text files to read in over different years over file numbers, but the data provider has provided SAS input statements that apply to all of them. 

 

What is the best way to read in all of these text files so I can apply the input statements code to them all at once?

 

Thanks in advance,

Jordan


Accepted Solutions
Solution
‎08-30-2017 06:15 PM
Super User
Posts: 23,343

Re: How do I read in multiple zipped text files at one time?


1) is it even possible to have a macro like this?

Yes

 


2) how do I reference the path in the filename statement?

 


You're currently using single quotes, macro variables resolve in double quotes. Switch it to double quotes.

 


3) how do I have the macro loop through all of the paths in the list created by %list_files?

 


You don't. Look at Call Execute, it will call the macro for every entry in the data set. The documentation has an example or I think my sample code had examples.

 

Things you may want to consider:

 

  • Instead of using a data step to append, consider PROC APPEND. If your data set is not well structured and you're not defining the types/lengths explicitly you'll run into issues with mismatching lengths and types.
  • Drop the TEMP dataset at the end so if there's some error with reading the file you don't keep appending the old data.
  • To build a macro you should first have working SAS code (perhaps you already do) that's the best idea and then convert that to a macro.

 

 

View solution in original post


All Replies
Super User
Posts: 23,343

Re: How do I read in multiple zipped text files at one time?

You're going to need a macro so welcome to the deep end quickly Smiley Happy

 

Once you get it working for one file, you can wrap that in a macro, create a list of the files, and call the macro multiple times using call execute. I'm going to point you to the different tools you may need and a example. If you need further help, post back with what your code looks like so far and any issues you're having.

 

You can use the FILENAME ZIP methods to read the file:

https://support.sas.com/documentation/cdl/en/lestmtsref/69738/HTML/default/viewer.htm#n1dn0f61yfyzto...

 

Regarding how to get list of files:

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

 

You can combine some of these together to get your full solution.

 

An example of how that works when it's all put together is here:

https://github.com/statgeek/SAS-Tutorials/blob/master/Import_all_files_one_type

Community Manager
Posts: 3,384

Re: How do I read in multiple zipped text files at one time?

FILENAME ZIP doesn't work with GZ files...yet.  That's coming in SAS 9.4 Maint 5, which is due to be released Very Soon.  In the meantime, for gzip files you'll still need to rely on external tools via FILENAME PIPE.

New Contributor
Posts: 2

Re: How do I read in multiple zipped text files at one time?

Thanks for the suggestions. I have been a little sidetracked with other work but it was helpful to get me started. 

 

I was able to implement the macro for the list of files very easily. I am stuck however with trying to implement a macro to append all of the files into one. 

 

Here is what I have so far:

%macro append_files(path);
	filename in pipe 'gzip -cd &path'; /* don't know how to reference this exactly */
	options nocenter validvarname=upcase;

	data temp;
		infile in lrecl=635 missover pad;
		%include input;
	run;

	data full;
		set full temp;
	run;

%mend;
	
%list_files(H:\Data\Seerm\Drive1, gz);

%append_files([go through all of the paths in the file list]);

My questions are: 

1) is it even possible to have a macro like this?

2) how do I reference the path in the filename statement?

3) how do I have the macro loop through all of the paths in the list created by %list_files?

 

Thanks!

Solution
‎08-30-2017 06:15 PM
Super User
Posts: 23,343

Re: How do I read in multiple zipped text files at one time?


1) is it even possible to have a macro like this?

Yes

 


2) how do I reference the path in the filename statement?

 


You're currently using single quotes, macro variables resolve in double quotes. Switch it to double quotes.

 


3) how do I have the macro loop through all of the paths in the list created by %list_files?

 


You don't. Look at Call Execute, it will call the macro for every entry in the data set. The documentation has an example or I think my sample code had examples.

 

Things you may want to consider:

 

  • Instead of using a data step to append, consider PROC APPEND. If your data set is not well structured and you're not defining the types/lengths explicitly you'll run into issues with mismatching lengths and types.
  • Drop the TEMP dataset at the end so if there's some error with reading the file you don't keep appending the old data.
  • To build a macro you should first have working SAS code (perhaps you already do) that's the best idea and then convert that to a macro.

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 380 views
  • 0 likes
  • 3 in conversation