BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
clonyem
Fluorite | Level 6

Hello:

I've been struggling with importing a the Lab.dat file from NHANES III using the SAS code posted on NHANES web site. However, the unique ID field called SEQN only gets period (.) entered for all the rows. I have tried Best12, Best8, and even Best32 formatting as suggested by some articles I found online to no effect. I'd appreciate help with getting the import done successfully. Some data elements could be missing for other fields but absolutely not for the SEQN (ID) field. Below is the data sample, code and output. The data file has almost 300 fields of which all I need is 13 as shown in the below code:

FILENAME LAB "/home/clonyem/NHANES/LAB.DAT" LRECL=1979;
*** LRECL includes 2 positions for CRLF, assuming use of PC SAS;

DATA LAB;
INFILE LAB MISSOVER;

LENGTH
SEQN 8
DMARETHN 3
DMARACER 3
DMAETHNR 3
HSSEX 3
HSAGEIR 3
HSAITMOR 4
DMPPIR 8
MXPAXTMR 4
CRP 8
CEP 8
CEPSI 8
OSPSI 3
;

FORMAT
SEQN BEST12.
DMPPIR Z6.3
CRP 8.2
CEP 6.1
CEPSI 8.1
OSPSI 8.
;

INPUT
SEQN 1-5
DMARETHN 12
DMARACER 13
DMAETHNR 14
HSSEX 15
HSAGEIR 16-17
HSAITMOR 19-22
DMPPIR 34-39
MXPAXTMR 1236-1239
CRP 1667-1671
CEP 1784-1787
CEPSI 1788-1793
OSPSI 1858-1860
;

LABEL
SEQN = "Sample person identification number"
DMARETHN = "Race-ethnicity"
DMARACER = "Race"
DMAETHNR = "Ethnicity"
HSSEX = "Sex"
HSAGEIR = "Age at interview (Screener)"
HSAITMOR = "Age in months at interview (screener)"
DMPPIR = "Poverty Income Ratio (unimputed income)"
MXPAXTMR = "Age in months at MEC exam"
CRP = "Serum C-reactive protein (mg/dL)"
CEP = "Serum creatinine (mg/dL)"
CEPSI = "Serum creatinine: SI (umol/L)"
OSPSI = "Serum osmolality: SI (mmol/Kg)"
;
proc export data=WORK.LAB
outfile = "/home/clonyem/NHANES/LAB.CSV"
dbms=csv;
run;

 

clonyem_0-1635694051221.png

SEQN data is missing in the output.

 

clonyem_1-1635694141540.png

Warning messages

 

clonyem_2-1635694239918.png

Row one of LAB.DAT data file, similar to the rest of the data rows.

Help is much appreciated!

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
clonyem
Fluorite | Level 6

Thanks for the response, but I'm not quite clear about what you mean by refining what the program should do. I got the code to read the data into SAS from NHANES website and below are all the code fragments that pertain to SEQN variable:

 

clonyem_0-1635727047641.png

clonyem_1-1635727161788.png

clonyem_2-1635727241106.png

 

I then added the [best.] format out of desperation because I wasn't getting anywhere in trying to extract the values for SEQN from the .dat file, which did not make any difference;

clonyem_3-1635727396968.png

 

Here is the url to the entire code provided by NHANES: https://wwwn.cdc.gov/nchs/data/nhanes3/1a/lab.sas  and the url to LAB.DAT file:

https://wwwn.cdc.gov/nchs/data/nhanes3/1a/lab.dat

 

Thanks!

 

 

 

 

 

 

 

View solution in original post

8 REPLIES 8
Astounding
PROC Star

So your INPUT statement says to take the contents of columns 1 through 5 to get SEQN:

 

input seqn 1-5;

However, the contents of columns 1 through 5 on the first line look like this:

1 000

What number does this represent?  Is it correct to try to get the data from columns 1 through 5?

 

You may need to refine what the program should do, before trying to revise the program.

clonyem
Fluorite | Level 6

Thanks for the response, but I'm not quite clear about what you mean by refining what the program should do. I got the code to read the data into SAS from NHANES website and below are all the code fragments that pertain to SEQN variable:

 

clonyem_0-1635727047641.png

clonyem_1-1635727161788.png

clonyem_2-1635727241106.png

 

I then added the [best.] format out of desperation because I wasn't getting anywhere in trying to extract the values for SEQN from the .dat file, which did not make any difference;

clonyem_3-1635727396968.png

 

Here is the url to the entire code provided by NHANES: https://wwwn.cdc.gov/nchs/data/nhanes3/1a/lab.sas  and the url to LAB.DAT file:

https://wwwn.cdc.gov/nchs/data/nhanes3/1a/lab.dat

 

Thanks!

 

 

 

 

 

 

 

clonyem
Fluorite | Level 6

I finally figured out what issue was preventing the data from coming across; meta character.
I'm not exactly sure where it was interfering since they are, of course, invisible so, I typed out the entire code in reliable Notepad++, copied it over into SAS Studio and...voila!

Thanks all!

Tom
Super User Tom
Super User

Something went wrong in your download of the .DAT file.

When I click on the link you posted I can see that the first line does NOT start with a 1 followed by a space.

image.png

clonyem
Fluorite | Level 6

Hi Tom,

 

The "source" data snippet I included in the initial post was not from the LAB.DAT source file proper, but a grab from the log that was output by SAS Studio during the failed runs. But you're right, as the first SEQN to come through is [0000]3.

Thanks for the observation and for taking the time to check out the source.

 

-Cyrille

Tom
Super User Tom
Super User

@clonyem wrote:

Hi Tom,

 

The "source" data snippet I included in the initial post was not from the LAB.DAT source file proper, but a grab from the log that was output by SAS Studio during the failed runs. But you're right, as the first SEQN to come through is [0000]3.

Thanks for the observation and for taking the time to check out the source.

 

-Cyrille


That is what those lines in the SAS log mean.  The actual text read from the INFILE (or in-line data).

 

You can use the LIST statement to look at it yourself without having to have had an input error.

So to see the content of the first three lines in LAB use code like this:

FILENAME LAB "/home/clonyem/NHANES/LAB.DAT" ;
data _null_;
  infile lab obs=3;
  input;
  list;
run;

Note: If you are using a version of SAS released in the last 10-20 years you probably don't need to include the LRECL= option.   You only need to set the LRECL when reading a normal text file if the line length is longer than the default. The default used to be only 256 but it has been 32767 for a long long time.

clonyem
Fluorite | Level 6
Very useful and helpful tip. This is my first ever dabbling into SAS, since I wasn't able to read NHANES III .dat files directly into R with all the missing headers. The eXPorT file format is much easier to work with that the older .DAT ones. Had I typed out the code in a vanilla editor, I believe I shouldn't have run into the wall, or as you pointed out the original download could have experienced corruption. But the above LIST stmt you posted is great for diagnosis. Thanks again, Tom!
clonyem
Fluorite | Level 6

Very useful and helpful tip. This is my first ever dabbling into SAS, since I wasn't able to read NHANES III .dat files directly into R with all the missing headers. The eXPorT file format is much easier to work with that the older .DAT ones. Had I typed out the code in a vanilla editor, I believe I shouldn't have run into the wall, or as you pointed out the original download could have experienced corruption. But the above LIST stmt you posted is great for diagnosis. Thanks again, Tom!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1243 views
  • 0 likes
  • 3 in conversation