BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Flexluthorella
Obsidian | Level 7

I've downloaded the 2018 Birth data files (US data files only) which is supposedly 223 mb. When the download was completed on my pc, its over 5GB. Notepad can't read it so I cant view the dataset/variables. I attempted to PROC IMPORT into SAS but that is not working. 

1 ACCEPTED SOLUTION

Accepted Solutions
SuzanneDorinski
Lapis Lazuli | Level 10

The page where you get the birth data files points users to a page on the National Bureau of Economic Research (NBER) website. If you go to that page on the NBER website and scroll down, you'll see a table for the United States birth data and documentation.  Jean Roth at NBER has posted SAS, Stata, and SPSS code to read the ASCII file.  She has also posted the file as a Stata file, a SAS data set. and CSV.

 

The bad news is that she has not done that for the 2018 file.  However, I think the record layout for the 2017 file is the same as the record layout for the 2018 file.  So, http://data.nber.org/natality/2017/natl2017.sas should help you get started reading the ASCII file into SAS. 

 

I was able to modify the 2017 program to read the zipped version of the 2018 file.  One odd note:  while the PDF for the 2018 file shows the record length as 1330, the record length is really 1345.  The NBER program for 2017 shows 15 variables in columns 1330 to 1345, but those columns are all missing in the 2018 file.

View solution in original post

15 REPLIES 15
mkeintz
PROC Star

@Flexluthorella wrote:

I've downloaded the 2018 Birth data files (US data files only) which is supposedly 223 mb. When the download was completed on my pc, its over 5GB. Notepad can't read it so I cant view the dataset/variables. I attempted to PROC IMPORT into SAS but that is not working. 


When you say "not working" we have virtually no information to provide advice.  Now NOTEPAD finds the downloaded file too big.  How about WORDPAD (use it to view, but not save), or you could download many other editors, like Notepad++.  These both likely have larger size limitations.

 

And if you downloaded something sized 223mb and got a 5GB file, it was more that a simple download.  Try using a more capable editor to view the download.  BTW, what it the url of the downloaded file?  Maybe someone on this forum can take a quick look.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Flexluthorella
Obsidian | Level 7
"Not working" meaning I can't open the file in notepad, notepad++ or wordpad, as they error message states file is "too big" or "failed to open". I am not sure what you mean by more capable editor. The url is https://www.cdc.gov/nchs/data_access/Vitalstatsonline.htm#Tools and its the 2018 Birth data file.
mkeintz
PROC Star

By more capable editor, I meant more capable than notepad, thinking either wordpad or notepad++ would do the job.  But I see you have tried that.

 

BUT I also see you have unzipped the downloaded file, so you can do this to make a sample file to visually inspect:

 

filename filein "C:\Users\…..\Downloads\Nat2018us\Nat2018PublicUS.c20190509.r20190717.txt";

data _null_;
  infile filein;
  file 'c:\temp\sampledata.txt';
  input;
  put _infile_;
  if _n_>=10 then stop;
run;

Then take a look at sampledata.txt.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Flexluthorella
Obsidian | Level 7
This gives me the first 10 lines of data. The sampledata.txt did not give variable names. I can't tell what I need to do from here. I can see a small fraction of the data.
mkeintz
PROC Star

If you go back to the url you provided, you will see a column to the left of your downloaded data.  The column name is titled "User's Guide (.pdf files)".  Clicking on the "2018 (1.7MB)" link in this self-descriptive column will provided a guide to the layout of the data in a pdf file.

 

This is a common practice with lots of demographic data files - one file with just data, and another file/codebook/user guide with the data layout description.  Welcome to the demographic data world.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Flexluthorella
Obsidian | Level 7
Right. Going back to my original issue, how do I get to read in ALL the data from the OG (large) file?
Tom
Super User Tom
Super User

@Flexluthorella wrote:
Right. Going back to my original issue, how do I get to read in ALL the data from the OG (large) file?

First thing is leave the file zipped.  No need to unzip it as SAS can unzip it on the file.

Second look at the description of the file and use that to write the code to read it.

So you will have something like this using column oriented reads.

data want;
  infile 'where I put the file.zip' zip truncover member='*' ;
  input var 1-10  var2 $11-12 .... ;
run;

Or perhaps you will want to use formatted mode instead.

data want;
  infile 'where I put the file.zip' zip truncover member='*' ;
  input var 1-10 10.  var2 $2. .... ;
run;

Or some mixture of the two.

Remember look at the data description to understand what data is in which columns. Whether the data is numbers or strings. Some variables that are coded only as digits you might want to read as strings since they are really categorical values and not numbers you could use in operations like MEAN().

mkeintz
PROC Star

@Flexluthorella wrote:
Right. Going back to my original issue, how do I get to read in ALL the data from the OG (large) file?

Use the layout in the pdf file to set up the necessary INPUT statement to read the data into a SAS data set.  You don't need to see the entire raw data set in any editor to do that.  And you could first do a test of your program using the 10-record (or some other small) subset of the original raw data.

 

The full reference to the input statement, including examples, is at Input Statement. There's another possibly useful sas link at Reading Raw Data with the SAS Input Statement 

 

If you haven't done the INPUT statement before, this will be a (worthwhile) experience.

 

Good luck, and bring back your questions once you start trying to use it.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Flexluthorella
Obsidian | Level 7
I do not know how to use the tools they provide. I did not think I could just start downloading tools to use with no idea how to use them.
Patrick
Opal | Level 21

The .pdf User Guide provides the data dictionary/data layout. Why isn't that sufficient for you to write the SAS data step to read the data in the .txt file into a SAS data set? 

Capture.JPG

 

There are text editors available which can also open .txt of multiple GB. Just Google for them.

I've used UltraEdit (which doesn't come for free) to open the text file. 

To get you started I've copied the first 30 lines into the attached sample_2018.txt file. 

SuzanneDorinski
Lapis Lazuli | Level 10

The page where you get the birth data files points users to a page on the National Bureau of Economic Research (NBER) website. If you go to that page on the NBER website and scroll down, you'll see a table for the United States birth data and documentation.  Jean Roth at NBER has posted SAS, Stata, and SPSS code to read the ASCII file.  She has also posted the file as a Stata file, a SAS data set. and CSV.

 

The bad news is that she has not done that for the 2018 file.  However, I think the record layout for the 2017 file is the same as the record layout for the 2018 file.  So, http://data.nber.org/natality/2017/natl2017.sas should help you get started reading the ASCII file into SAS. 

 

I was able to modify the 2017 program to read the zipped version of the 2018 file.  One odd note:  while the PDF for the 2018 file shows the record length as 1330, the record length is really 1345.  The NBER program for 2017 shows 15 variables in columns 1330 to 1345, but those columns are all missing in the 2018 file.

Flexluthorella
Obsidian | Level 7
I am still having issues; I get 0 records read in. Can you share with me?

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 3895 views
  • 5 likes
  • 6 in conversation