Hello:
I've been struggling with importing a the Lab.dat file from NHANES III using the SAS code posted on NHANES web site. However, the unique ID field called SEQN only gets period (.) entered for all the rows. I have tried Best12, Best8, and even Best32 formatting as suggested by some articles I found online to no effect. I'd appreciate help with getting the import done successfully. Some data elements could be missing for other fields but absolutely not for the SEQN (ID) field. Below is the data sample, code and output. The data file has almost 300 fields of which all I need is 13 as shown in the below code:
FILENAME LAB "/home/clonyem/NHANES/LAB.DAT" LRECL=1979;
*** LRECL includes 2 positions for CRLF, assuming use of PC SAS;
DATA LAB;
INFILE LAB MISSOVER;
LENGTH
SEQN 8
DMARETHN 3
DMARACER 3
DMAETHNR 3
HSSEX 3
HSAGEIR 3
HSAITMOR 4
DMPPIR 8
MXPAXTMR 4
CRP 8
CEP 8
CEPSI 8
OSPSI 3
;
FORMAT
SEQN BEST12.
DMPPIR Z6.3
CRP 8.2
CEP 6.1
CEPSI 8.1
OSPSI 8.
;
INPUT
SEQN 1-5
DMARETHN 12
DMARACER 13
DMAETHNR 14
HSSEX 15
HSAGEIR 16-17
HSAITMOR 19-22
DMPPIR 34-39
MXPAXTMR 1236-1239
CRP 1667-1671
CEP 1784-1787
CEPSI 1788-1793
OSPSI 1858-1860
;
LABEL
SEQN = "Sample person identification number"
DMARETHN = "Race-ethnicity"
DMARACER = "Race"
DMAETHNR = "Ethnicity"
HSSEX = "Sex"
HSAGEIR = "Age at interview (Screener)"
HSAITMOR = "Age in months at interview (screener)"
DMPPIR = "Poverty Income Ratio (unimputed income)"
MXPAXTMR = "Age in months at MEC exam"
CRP = "Serum C-reactive protein (mg/dL)"
CEP = "Serum creatinine (mg/dL)"
CEPSI = "Serum creatinine: SI (umol/L)"
OSPSI = "Serum osmolality: SI (mmol/Kg)"
;
proc export data=WORK.LAB
outfile = "/home/clonyem/NHANES/LAB.CSV"
dbms=csv;
run;
SEQN data is missing in the output.
Warning messages
Row one of LAB.DAT data file, similar to the rest of the data rows.
Help is much appreciated!
Thanks for the response, but I'm not quite clear about what you mean by refining what the program should do. I got the code to read the data into SAS from NHANES website and below are all the code fragments that pertain to SEQN variable:
I then added the [best.] format out of desperation because I wasn't getting anywhere in trying to extract the values for SEQN from the .dat file, which did not make any difference;
Here is the url to the entire code provided by NHANES: https://wwwn.cdc.gov/nchs/data/nhanes3/1a/lab.sas and the url to LAB.DAT file:
https://wwwn.cdc.gov/nchs/data/nhanes3/1a/lab.dat
Thanks!
So your INPUT statement says to take the contents of columns 1 through 5 to get SEQN:
input seqn 1-5;
However, the contents of columns 1 through 5 on the first line look like this:
1 000
What number does this represent? Is it correct to try to get the data from columns 1 through 5?
You may need to refine what the program should do, before trying to revise the program.
Thanks for the response, but I'm not quite clear about what you mean by refining what the program should do. I got the code to read the data into SAS from NHANES website and below are all the code fragments that pertain to SEQN variable:
I then added the [best.] format out of desperation because I wasn't getting anywhere in trying to extract the values for SEQN from the .dat file, which did not make any difference;
Here is the url to the entire code provided by NHANES: https://wwwn.cdc.gov/nchs/data/nhanes3/1a/lab.sas and the url to LAB.DAT file:
https://wwwn.cdc.gov/nchs/data/nhanes3/1a/lab.dat
Thanks!
I finally figured out what issue was preventing the data from coming across; meta character.
I'm not exactly sure where it was interfering since they are, of course, invisible so, I typed out the entire code in reliable Notepad++, copied it over into SAS Studio and...voila!
Thanks all!
Something went wrong in your download of the .DAT file.
When I click on the link you posted I can see that the first line does NOT start with a 1 followed by a space.
Hi Tom,
The "source" data snippet I included in the initial post was not from the LAB.DAT source file proper, but a grab from the log that was output by SAS Studio during the failed runs. But you're right, as the first SEQN to come through is [0000]3.
Thanks for the observation and for taking the time to check out the source.
-Cyrille
@clonyem wrote:
Hi Tom,
The "source" data snippet I included in the initial post was not from the LAB.DAT source file proper, but a grab from the log that was output by SAS Studio during the failed runs. But you're right, as the first SEQN to come through is [0000]3.
Thanks for the observation and for taking the time to check out the source.
-Cyrille
That is what those lines in the SAS log mean. The actual text read from the INFILE (or in-line data).
You can use the LIST statement to look at it yourself without having to have had an input error.
So to see the content of the first three lines in LAB use code like this:
FILENAME LAB "/home/clonyem/NHANES/LAB.DAT" ;
data _null_;
infile lab obs=3;
input;
list;
run;
Note: If you are using a version of SAS released in the last 10-20 years you probably don't need to include the LRECL= option. You only need to set the LRECL when reading a normal text file if the line length is longer than the default. The default used to be only 256 but it has been 32767 for a long long time.
Very useful and helpful tip. This is my first ever dabbling into SAS, since I wasn't able to read NHANES III .dat files directly into R with all the missing headers. The eXPorT file format is much easier to work with that the older .DAT ones. Had I typed out the code in a vanilla editor, I believe I shouldn't have run into the wall, or as you pointed out the original download could have experienced corruption. But the above LIST stmt you posted is great for diagnosis. Thanks again, Tom!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Ready to level-up your skills? Choose your own adventure.