I'll share part of the program in this post, and attach the whole thing. I borrowed some code from the SAS Dummy blog, to figure out the name of the file within the zipped file.
I ran that, and then copied the really long file name into the INFILE statement in the data step.
* borrowing code from https://blogs.sas.com/content/sasdummy/2014/01/29/using-filename-zip/ ;
%let ziploc = /folders/myfolders/NCHS Vital Statistics/Nat2018us.zip;
/* Assign a fileref wth the ZIP method */
filename inzip zip "&ziploc";
/* Read the "members" (files) from the ZIP file */
data contents(keep=memname);
length memname $200;
fid=dopen("inzip");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
memname=dread(fid,i);
output;
end;
rc=dclose(fid);
run;
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;
run;
title;
*options obs=100 ;
options obs=max;
*options nocenter ;
**------------------------------------------------ ;
** by Jean Roth Thu Oct 12 11:09:27 EDT 2017
** This program reads the 2017 NCHS Natality Detail Data File ;
** Report errors to jroth@nber.org ;
** This program is distributed under the GNU GPL. ;
** See end of this file and
** http://www.gnu.org/licenses/ for details. ;
** ----------------------------------------------- ;
* The following line should contain the directory
where the SAS file is to be stored ;
*libname library "/folders/myfolders/NCHS Vital Statistics/";
* The following line should contain
the complete path and name of the raw data file.
On a PC, use backslashes in paths as in C:\ ;
*FILENAME datafile pipe "7z e /homes/data/natality/2017/natl2017.zip -so ";
* The following line should contain the name of the SAS dataset ;
%let dataset = natl2018;
DATA &dataset ;
INFILE inzip(Nat2018PublicUS.c20190509.r20190717.txt) zip truncover LRECL = 20000 ;
attrib dob_yy length=4 label="Birth Year";
attrib dob_mm length=3 label="Birth Month 01 January";
attrib dob_tt length=4 label="Time of Birth 0000-2359 Time of Birth";
When I tested the code, the options obs=100 statement was uncommented. Once I was sure that the code was correct, I commented out that line, then typed in the options obs=max statement.
Since the data set is 2 GB, I decided that I didn't want to have a permanent version, so I commented out the LIBNAME statement. I commented out the FILENAME statement provided by NBER, because I don't think it will work on my home computer. I'm using the FILENAME statement from the SAS blog instead.
I didn't have to change anything else in the NBER program after the INFILE statement in the data step.
... View more