I've downloaded the 2018 Birth data files (US data files only) which is supposedly 223 mb. When the download was completed on my pc, its over 5GB. Notepad can't read it so I cant view the dataset/variables. I attempted to PROC IMPORT into SAS but that is not working.
The page where you get the birth data files points users to a page on the National Bureau of Economic Research (NBER) website. If you go to that page on the NBER website and scroll down, you'll see a table for the United States birth data and documentation. Jean Roth at NBER has posted SAS, Stata, and SPSS code to read the ASCII file. She has also posted the file as a Stata file, a SAS data set. and CSV.
The bad news is that she has not done that for the 2018 file. However, I think the record layout for the 2017 file is the same as the record layout for the 2018 file. So, http://data.nber.org/natality/2017/natl2017.sas should help you get started reading the ASCII file into SAS.
I was able to modify the 2017 program to read the zipped version of the 2018 file. One odd note: while the PDF for the 2018 file shows the record length as 1330, the record length is really 1345. The NBER program for 2017 shows 15 variables in columns 1330 to 1345, but those columns are all missing in the 2018 file.
@Flexluthorella wrote:
I've downloaded the 2018 Birth data files (US data files only) which is supposedly 223 mb. When the download was completed on my pc, its over 5GB. Notepad can't read it so I cant view the dataset/variables. I attempted to PROC IMPORT into SAS but that is not working.
When you say "not working" we have virtually no information to provide advice. Now NOTEPAD finds the downloaded file too big. How about WORDPAD (use it to view, but not save), or you could download many other editors, like Notepad++. These both likely have larger size limitations.
And if you downloaded something sized 223mb and got a 5GB file, it was more that a simple download. Try using a more capable editor to view the download. BTW, what it the url of the downloaded file? Maybe someone on this forum can take a quick look.
By more capable editor, I meant more capable than notepad, thinking either wordpad or notepad++ would do the job. But I see you have tried that.
BUT I also see you have unzipped the downloaded file, so you can do this to make a sample file to visually inspect:
filename filein "C:\Users\…..\Downloads\Nat2018us\Nat2018PublicUS.c20190509.r20190717.txt";
data _null_;
infile filein;
file 'c:\temp\sampledata.txt';
input;
put _infile_;
if _n_>=10 then stop;
run;
Then take a look at sampledata.txt.
If you go back to the url you provided, you will see a column to the left of your downloaded data. The column name is titled "User's Guide (.pdf files)". Clicking on the "2018 (1.7MB)" link in this self-descriptive column will provided a guide to the layout of the data in a pdf file.
This is a common practice with lots of demographic data files - one file with just data, and another file/codebook/user guide with the data layout description. Welcome to the demographic data world.
@Flexluthorella wrote:
Right. Going back to my original issue, how do I get to read in ALL the data from the OG (large) file?
First thing is leave the file zipped. No need to unzip it as SAS can unzip it on the file.
Second look at the description of the file and use that to write the code to read it.
So you will have something like this using column oriented reads.
data want;
infile 'where I put the file.zip' zip truncover member='*' ;
input var 1-10 var2 $11-12 .... ;
run;
Or perhaps you will want to use formatted mode instead.
data want;
infile 'where I put the file.zip' zip truncover member='*' ;
input var 1-10 10. var2 $2. .... ;
run;
Or some mixture of the two.
Remember look at the data description to understand what data is in which columns. Whether the data is numbers or strings. Some variables that are coded only as digits you might want to read as strings since they are really categorical values and not numbers you could use in operations like MEAN().
@Flexluthorella wrote:
Right. Going back to my original issue, how do I get to read in ALL the data from the OG (large) file?
Use the layout in the pdf file to set up the necessary INPUT statement to read the data into a SAS data set. You don't need to see the entire raw data set in any editor to do that. And you could first do a test of your program using the 10-record (or some other small) subset of the original raw data.
The full reference to the input statement, including examples, is at Input Statement. There's another possibly useful sas link at Reading Raw Data with the SAS Input Statement
If you haven't done the INPUT statement before, this will be a (worthwhile) experience.
Good luck, and bring back your questions once you start trying to use it.
Have you tried the tools provided on this NCHS site ?
The .pdf User Guide provides the data dictionary/data layout. Why isn't that sufficient for you to write the SAS data step to read the data in the .txt file into a SAS data set?
There are text editors available which can also open .txt of multiple GB. Just Google for them.
I've used UltraEdit (which doesn't come for free) to open the text file.
To get you started I've copied the first 30 lines into the attached sample_2018.txt file.
The page where you get the birth data files points users to a page on the National Bureau of Economic Research (NBER) website. If you go to that page on the NBER website and scroll down, you'll see a table for the United States birth data and documentation. Jean Roth at NBER has posted SAS, Stata, and SPSS code to read the ASCII file. She has also posted the file as a Stata file, a SAS data set. and CSV.
The bad news is that she has not done that for the 2018 file. However, I think the record layout for the 2017 file is the same as the record layout for the 2018 file. So, http://data.nber.org/natality/2017/natl2017.sas should help you get started reading the ASCII file into SAS.
I was able to modify the 2017 program to read the zipped version of the 2018 file. One odd note: while the PDF for the 2018 file shows the record length as 1330, the record length is really 1345. The NBER program for 2017 shows 15 variables in columns 1330 to 1345, but those columns are all missing in the 2018 file.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.