06-27-2013 03:25 PM
I am new to SAS, and I am using very large data sets on a computer with a small amount of space. I have 21 zipped data sets in one folder and they have the names Hospital Inpatient_1989-Hospital Inpatient_2010. My goal is to create a data set of summary statistics (which works for an individual data set), but I need to get the combined summary statistics for all data sets. However, since I have limited space, what I need to do is unzip a data set, then perform some operations on it (which cuts the variables down from about 200 to 7), then (I think) append each data set to a temporary file that is originally blank, and then delete the unzipped data set off of the computer. This needs to repeat for each zipped data set so that there is one temporary file with all of the data sets together (with of course the operations performed on them) and the folder in my directory is the same as before the operation (which is just the zipped files).
So I believe it should look something like:
1) Define macro or loop that will go through all data sets
2) Unzip file using something like
x " ""C:\Program Files\WinZip\WINZIP32.EXE"" ""C:\...\Hospital Inpatient_1989"" ";
3) Perform some operations
4) append using something like
proc append base = inpatient_all data = test;
5) Delete unzipped data set
I am fine with 3) and 4), and I think I can figure out 2), so I mainly need help with just defining the macro to go through all the data sets and how to delete them off of my computer afterwards. Also, if this doesn't seem like a logical way to go about this that would be great to know.
Thanks so much for any help you can give! I am happy to give as much additional specific information as I can if it would be helpful.
06-27-2013 05:04 PM
Looks like you have WinZip so get their command line tool (you should be able to download it off the internet). Play with it in a DOS window until you have the syntax working properly (you will probably want to do this with a smaller zip file that will not take so long).
Then see if you can code a macro to do your 5 steps for one of the files.
Personally I like to use DATA _NULL_ steps to run the operating system commands instead of X command because it can capture any error messages the command emits.
infile ".....dos command...." pipe ;
Now you can run it for each file. With 21 files it might be easiest to replicate the macro call 21 times and change the parameters to reflect the 21 different files. Later you can try creating a macro or a data step that could generate the multiple calls, but it does not sound like it is worth it for now.
06-29-2013 01:51 AM
I finally got the unzipping to work from SAS using your suggestion, so thanks a lot! I am still trying to figure out how to perform the number 5) that I indicated. I am wondering if it is possible and if so, how I would be able to delete the data set off of my computer after unzipping it and performing all of the operations on the data set.
Thanks a lot for your help!