SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
abhi309
Obsidian | Level 7

I have the following data in a txt format and I need to import it as a SAS dataset.  I tried importing as tab  delimited file but data is looking completely different in SAS. 

The file is very big and even when I am trying to open it with any txt editor it is not showing me the correct format. Below is the link to download the file and detail of the variables. 

 

https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/natality/Nat2023us.zip

https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/DVS/natality/UserGuide2023.pdf

 

 

 

4 REPLIES 4
ballardw
Super User

First thing you attached two files. Import which? The PDF describes the text file and you would use that to write a data step to read the text file. Import will almost certainly get stuff wrong.

Starting on Page 8 of the PDF and you see the first column with "Position" as the header shows that this is fixed column file. That means specific variables occupy specify column(s). Import will never get that correct as it expects something in the file to tell where values start/end. Typically tab or some other character if the file is text. In this case there is no such delimiter between variables.

 

I would strongly suggest searching the source website as CDC quite often has a section with SAS code to read the files and often instructions for the various versions.  The documentation starting at page 8 runs more than 30 pages of file description. I have used such to write code to read files. It is part of my paid job. I am not going to that much effort on this forum where I volunteer.

 

Here is how to apply the information starting at page 8 to write a data step to read the file after you unzip the file. Unzip the file before running following code and use the path to the unzipped file and what ever the name of the file may be on the Infile statement. The $ means you are reading character values.

data datasetname;
   infile "<path>\filename.txt" truncover lrecl=32000;
   input
      filler $ 1-8 
      dob_yy 9-12
      dob_mm 13-14
      filler $ 15-18 
      dob_tt 19-22
   ;
   drop filler;
run;

The "filler" actually are not needed in the code but I included for continuity to follow the text in the document. You would only have one variable by that name in the output and since it is "blank" dropping it makes sense.

 

 

 

 

Patrick
Opal | Level 21

If you can't find ready-made SAS data step code as suggested by @ballardw then you need to create this code by yourself based on the record description in the pdf.

 

Below some input which is hopefully helpful to you.

 

1. Using WSL (a Linux prompt for Windows) I've run below command to create the attached txt file which contains the first 2000 lines of your big file. This should allow you to inspect the source data in an editor.

head -n 2000 ./Nat2023PublicUS.c20240509.r20240724.txt > ./Nat2023PublicUS.c20240509.r20240724_part01.txt

I also created a part of the data step input statement with AI assistance (using the pdf record description as input). That's likely not fully correct but it should give you a start in case you can't find any ready made code.

Spoiler
/* Define the file path */
%let filepath = 'C:\temp\Nat2023us\Nat2023PublicUS.c20240509.r20240724_part01.txt';

/* Read the data */
data natality2023;
    infile "&filepath" truncover;
    input
        FILLER1 $ 1-8
        DOB_YY $ 9-12
        DOB_MM $ 13-14
        FILLER2 $ 15-18
        DOB_TT $ 19-22
        DOB_WK $ 23
        FILLER3 $ 24-31
        BFACIL $ 32
        F_BFACIL $ 33
        FILLER4 $ 34-49
        BFACIL3 $ 50
        FILLER5 $ 51-72
        MAGE_IMPFLG $ 73
        MAGE_REPFLG $ 74
        MAGER 75-76
        MAGER14 $ 77-78
        MAGER9 $ 79
        FILLER6 $ 80-83
        MBSTATE_REC $ 84
        FILLER7 $ 85-103
        RESTATUS $ 104
        MRACE31 $ 105-106
        MRACE6 $ 107
        MRACE15 $ 108-109
        FILLER8 $ 110
        MRACEIMP $ 111
        MHISPX $ 112
        FILLER9 $ 113-114
        MHISP_R $ 115
        F_MHISP $ 116
        MRACEHISP $ 117
        FILLER10 $ 118
        MAR_P $ 119
        DMAR $ 120
        MAR_IMP $ 121
        FILLER11 $ 122
        F_MAR_P $ 123
        MEDUC $ 124
        FILLER12 $ 125
        F_MEDUC $ 126
        FILLER13 $ 127-141
        **bleep**ERPT_FLG $ 142
        FILLER14 $ 143-146
        **bleep**ECOMB $ 147-148
        **bleep**EREC11 $ 149-150
        FRACE31 $ 151-152
        FRACE6 $ 153
        FRACE15 $ 154-155
        FILLER15 $ 156-158
        FHISPX $ 159
        FHISP_R $ 160
        F_FHISP $ 161
        FRACEHISP $ 162
        FEDUC $ 163
        FILLER16 $ 164
        f_FEDUC $ 165
        FILLER17 $ 166-170
        PRIORLIVE $ 171-172
        PRIORDEAD $ 173-174
        PRIORTERM $ 175-176
        FILLER18 $ 177-178
        LBO_REC $ 179
        FILLER19 $ 180-181
        TBO_REC $ 182
        FILLER20 $ 183-197
        ILLB_R $ 198-200
        ILLB_R11 $ 201-202
        FILLER21 $ 203-205
        ILOP_R $ 206-208
        ILOP_R11 $ 209-210
        FILLER22 $ 211-213
        ILP_R $ 214-216
        ILP_R11 $ 217-218
        FILLER23 $ 219-223
        PRECARE $ 224-225
        F_MPCB $ 226
        PRECARE5 $ 227
        FILLER24 $ 228-237
        PREVIS $ 238-239
        FILLER25 $ 240-241
        PREVIS_REC $ 242-243
        F_TPCV $ 244
        FILLER26 $ 245-250
        WIC $ 251
        F_WIC $ 252
        CIG_0 $ 253-254
        CIG_1 $ 255-256
        CIG_2 $ 257-258
        CIG_3 $ 259-260
        CIG0_R $ 261
        CIG1_R $ 262
        CIG2_R $ 263
        CIG3_R $ 264
        F_CIGS_0 $ 265
        F_CIGS_1 $ 266
        F_CIGS_2 $ 267
        F_CIGS_3 $ 268
        CIG_REC $ 269
        F_TOBACO $ 270
        FILLER27 $ 271-279
        M_Ht_In $ 280-281
        F_M_HT $ 282
        BMI $ 283-286
        BMI_R $ 287
        FILLER28 $ 288-291
        PWgt_R $ 292-294
        F_PWGT $ 295
        FILLER29 $ 296-298
        DWgt_R $ 299-301
        FILLER30 $ 302
        F_DWGT $ 303
        WTGAIN $ 304-305
        WTGAIN_REC $ 306
        F_WTGAIN $ 307
        FILLER31 $ 308-312
        RF_PDIAB $ 313
        RF_GDIAB $ 314
        RF_PHYPE $ 315
        RF_GHYPE $ 316
        RF_EHYPE $ 317
        RF_PPTERM $ 318
        F_RF_PDIAB $ 319
        F_RF_GDIAB $ 320
        F_RF_PHYPER $ 321
        F_RF_GHYPER $ 322
        F_RF_ECLAMP $ 323
        F_RF_PPB $ 324
        RF_INFTR $ 325
        RF_FEDRG $ 326
        RF_ARTEC $ 327
        f_RF_INFT $ 328
        F_RF_INF_DRG $ 329
        F_RF_INF_ART $ 330
        RF_CESAR $ 331
        RF_CESARN $ 332-333
        FILLER32 $ 334
        F_RF_CESAR $ 335
        F_RF_NCESAR $ 336
        NO_RISKS $ 337
        FILLER33 $ 338-342
        IP_GON $ 343
        IP_SYPH $ 344
        IP_CHLAM $ 345
        IP_HEPB $ 346
        IP_HEPC $ 347
        F_IP_GONOR $ 348
        F_IP_SYPH $ 349
        F_IP_CHLAM $ 350
        F_IP_HEPATB $ 351
        F_IP_HEPATC $ 352
        NO_INFEC $ 353
        FILLER34 $ 354-358
        FILLER35 $ 359
        OB_ECVS $ 360
        OB_ECVF $ 361
        FILLER36 $ 362
        F_OB_SUCC $ 363
        F_OB_FAIL $ 364
        FILLER37 $ 365-382
        LD_INDL $ 383
        LD_AUGM $ 384
        LD_STER $ 385
        LD_ANTB $ 386
        LD_CHOR $ 387
        LD_ANES $ 388
        F_LD_INDL $ 389
        F_LD_AUGM $ 390
        F_LD_STER $ 391
        F_LD_ANTB $ 392
        F_LD_CHOR $ 393
        F_LD_ANES $ 394
        NO_LBRDLV $ 395
        FILLER38 $ 396-400
        ME_PRES $ 401
        ME_ROUT $ 402
        ME_TRIAL $ 403
        F_ME_PRES $ 404
        F_ME_ROUT $ 405
        F_ME_TRIAL $ 406
        RDMETH_REC $ 407
        DMETH_REC $ 408
        F_DMETH_REC $ 409
        FILLER39 $ 410-414
        MM_MTR $ 415
        MM_PLAC $ 416
        MM_RUPT $ 417
        MM_UHYST $ 418
        MM_AICU $ 419
        FILLER40 $ 420
        F_MM_MTR $ 421
        F_MM_PLAC $ 422
        F_MM_RUPT $ 423
        F_MM_UHYST $ 424
        F_MM_AICU $ 425
        FILLER41 $ 426
        NO_MMORB $ 427
        FILLER42 $ 428-432
        ATTEND $ 433
        MTRAN $ 434
        PAY $ 435
        PAY_REC $ 436
        F_PAY $ 437
        F_PAY_REC $ 438
        FILLER43 $ 439-443
        APGAR5 $ 444-445
        APGAR5R $ 446
        F_APGAR5 $ 447
        APGAR10 $ 448-449
        ;
run;

And of course because the source data is positional it's easy to only read the columns you actually need into SAS which hopefully will significantly reduce the effort.

Tom
Super User Tom
Super User

First thing is you cannot "import" that type of file.  The data is arranged in fixed positions on each line. You have to actually write a data step to actual READ in the data from the file instead.

 

Please show the data step you wrote to read it and explain in what ways it did not work.

 

Second thing is that is a ZIP file and the ZIP engine in SAS 9.4m7 seems to have trouble with expanding it. 

1    data _null_;
2      infile "C:\Downloads\Nat2023us.zip" zip member='*'
3        termstr=crlf obs=2;
4      input;
5      list;
6    run;

ERROR: Open failure for C:\Downloads\Nat2023us.zip during attempt to create a local file handle.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

So you will probably need to first unzip the file before SAS can read it.

Spoiler
55   data _null_;
56     infile "%sysfunc(pathname(work))/Nat2023PublicUS.c20240509.r20240724.txt"
57       termstr=crlf obs=2;
58     input;
59     list;
60   run;

NOTE: The infile "/Volumes/scratch/SAS_workxxx/Nat2023PublicUS.c20240509.r20240724.txt" is:
      Filename=/Volumes/scratch/SAS_workxxx/Nat2023PublicUS.c20240509.r20240724.txt,
      Owner Name=xxx,Group Name=xxx,
      Access Permission=-rw-rw-r--,
      Last Modified=24Jul2024:12:18:02,
      File Size (bytes)=4856044107

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1                 202301    01171        11                1                        37116    1
     101     103303  0  013 Y2  17 1                    330501101   00115 1     010001  2  3               028
     201  04   20208   02804     0311          12  071      N10000000000001111N1         64124.02    1401   15
     301  6 11621     NNNNNN111111NXX111N00 111     NNNNN111111      NN 11                  NNNNNN1111111
     401  11X111111     NNNNN 11111 1     1N1111     0941885   1    9               F 04  2022     39072    39
     501  0723625 083     NNNNNN 111111 1     NNNNNN111111NNNNNN1111111     NYY1
     601
     701
     801
     901
    1001
    1101
    1201
    1301                                                1345
2                 202301    08594        11                1                        29094    2
     101     105514  0  015 X1  15 1                    310505514   00155 1     010001  2  3               077
     201  08   01302   01302     0311          07  051      Y10000000000001111N1         57132.74    1511   17
     301  6 12531     NNNNNN111111NXX111N00 111     NNNNN111111      NN 11                  NNNNNY1111110
     401  11X111111     NNNNN 11111 1     1N5411     0941885   1    9               M 04  2022     39072    39
     501  0723284 073     NNNNNN 111111 1     NNNNNN111111NNNNNN1111111     NYY1
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
     601
     701
     801
     901
    1001
    1101
    1201
    1301                                                1345
NOTE: 2 records were read from the infile "/Volumes/scratch/SAS_work6D6F00009B54_amraelp00008222.pfizer.com/Nat2023PublicUS.c20240509.r20240724.txt".
      The minimum record length was 1345.
      The maximum record length was 1345.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

abhi309
Obsidian | Level 7

Thank you everyone for your help. I was looking for another way other than data step to import this file since there are around 150 variables but I don't think there is any other way. So I will go forward and use the data step.

 

Thanks again

 

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1914 views
  • 3 likes
  • 4 in conversation