SAS Programming

abhi309 · Posted 10-16-2024 07:26 PM

I have the following data in a txt format and I need to import it as a SAS dataset. I tried importing as tab delimited file but data is looking completely different in SAS.

The file is very big and even when I am trying to open it with any txt editor it is not showing me the correct format. Below is the link to download the file and detail of the variables.

https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/natality/Nat2023us.zip

https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/DVS/natality/UserGuide2023.pdf

ballardw · Posted 10-16-2024 10:38 PM

First thing you attached two files. Import which? The PDF describes the text file and you would use that to write a data step to read the text file. Import will almost certainly get stuff wrong.

Starting on Page 8 of the PDF and you see the first column with "Position" as the header shows that this is fixed column file. That means specific variables occupy specify column(s). Import will never get that correct as it expects something in the file to tell where values start/end. Typically tab or some other character if the file is text. In this case there is no such delimiter between variables.

I would strongly suggest searching the source website as CDC quite often has a section with SAS code to read the files and often instructions for the various versions. The documentation starting at page 8 runs more than 30 pages of file description. I have used such to write code to read files. It is part of my paid job. I am not going to that much effort on this forum where I volunteer.

Here is how to apply the information starting at page 8 to write a data step to read the file after you unzip the file. Unzip the file before running following code and use the path to the unzipped file and what ever the name of the file may be on the Infile statement. The $ means you are reading character values.

data datasetname;
   infile "<path>\filename.txt" truncover lrecl=32000;
   input
      filler $ 1-8 
      dob_yy 9-12
      dob_mm 13-14
      filler $ 15-18 
      dob_tt 19-22
   ;
   drop filler;
run;

The "filler" actually are not needed in the code but I included for continuity to follow the text in the document. You would only have one variable by that name in the output and since it is "blank" dropping it makes sense.

Patrick · Posted 10-16-2024 11:10 PM

If you can't find ready-made SAS data step code as suggested by @ballardw then you need to create this code by yourself based on the record description in the pdf.

Below some input which is hopefully helpful to you.

1. Using WSL (a Linux prompt for Windows) I've run below command to create the attached txt file which contains the first 2000 lines of your big file. This should allow you to inspect the source data in an editor.

head -n 2000 ./Nat2023PublicUS.c20240509.r20240724.txt > ./Nat2023PublicUS.c20240509.r20240724_part01.txt

I also created a part of the data step input statement with AI assistance (using the pdf record description as input). That's likely not fully correct but it should give you a start in case you can't find any ready made code.

Spoiler

/* Define the file path */
%let filepath = 'C:\temp\Nat2023us\Nat2023PublicUS.c20240509.r20240724_part01.txt';

/* Read the data */
data natality2023;
    infile "&filepath" truncover;
    input
        FILLER1 $ 1-8
        DOB_YY $ 9-12
        DOB_MM $ 13-14
        FILLER2 $ 15-18
        DOB_TT $ 19-22
        DOB_WK $ 23
        FILLER3 $ 24-31
        BFACIL $ 32
        F_BFACIL $ 33
        FILLER4 $ 34-49
        BFACIL3 $ 50
        FILLER5 $ 51-72
        MAGE_IMPFLG $ 73
        MAGE_REPFLG $ 74
        MAGER 75-76
        MAGER14 $ 77-78
        MAGER9 $ 79
        FILLER6 $ 80-83
        MBSTATE_REC $ 84
        FILLER7 $ 85-103
        RESTATUS $ 104
        MRACE31 $ 105-106
        MRACE6 $ 107
        MRACE15 $ 108-109
        FILLER8 $ 110
        MRACEIMP $ 111
        MHISPX $ 112
        FILLER9 $ 113-114
        MHISP_R $ 115
        F_MHISP $ 116
        MRACEHISP $ 117
        FILLER10 $ 118
        MAR_P $ 119
        DMAR $ 120
        MAR_IMP $ 121
        FILLER11 $ 122
        F_MAR_P $ 123
        MEDUC $ 124
        FILLER12 $ 125
        F_MEDUC $ 126
        FILLER13 $ 127-141
        **bleep**ERPT_FLG $ 142
        FILLER14 $ 143-146
        **bleep**ECOMB $ 147-148
        **bleep**EREC11 $ 149-150
        FRACE31 $ 151-152
        FRACE6 $ 153
        FRACE15 $ 154-155
        FILLER15 $ 156-158
        FHISPX $ 159
        FHISP_R $ 160
        F_FHISP $ 161
        FRACEHISP $ 162
        FEDUC $ 163
        FILLER16 $ 164
        f_FEDUC $ 165
        FILLER17 $ 166-170
        PRIORLIVE $ 171-172
        PRIORDEAD $ 173-174
        PRIORTERM $ 175-176
        FILLER18 $ 177-178
        LBO_REC $ 179
        FILLER19 $ 180-181
        TBO_REC $ 182
        FILLER20 $ 183-197
        ILLB_R $ 198-200
        ILLB_R11 $ 201-202
        FILLER21 $ 203-205
        ILOP_R $ 206-208
        ILOP_R11 $ 209-210
        FILLER22 $ 211-213
        ILP_R $ 214-216
        ILP_R11 $ 217-218
        FILLER23 $ 219-223
        PRECARE $ 224-225
        F_MPCB $ 226
        PRECARE5 $ 227
        FILLER24 $ 228-237
        PREVIS $ 238-239
        FILLER25 $ 240-241
        PREVIS_REC $ 242-243
        F_TPCV $ 244
        FILLER26 $ 245-250
        WIC $ 251
        F_WIC $ 252
        CIG_0 $ 253-254
        CIG_1 $ 255-256
        CIG_2 $ 257-258
        CIG_3 $ 259-260
        CIG0_R $ 261
        CIG1_R $ 262
        CIG2_R $ 263
        CIG3_R $ 264
        F_CIGS_0 $ 265
        F_CIGS_1 $ 266
        F_CIGS_2 $ 267
        F_CIGS_3 $ 268
        CIG_REC $ 269
        F_TOBACO $ 270
        FILLER27 $ 271-279
        M_Ht_In $ 280-281
        F_M_HT $ 282
        BMI $ 283-286
        BMI_R $ 287
        FILLER28 $ 288-291
        PWgt_R $ 292-294
        F_PWGT $ 295
        FILLER29 $ 296-298
        DWgt_R $ 299-301
        FILLER30 $ 302
        F_DWGT $ 303
        WTGAIN $ 304-305
        WTGAIN_REC $ 306
        F_WTGAIN $ 307
        FILLER31 $ 308-312
        RF_PDIAB $ 313
        RF_GDIAB $ 314
        RF_PHYPE $ 315
        RF_GHYPE $ 316
        RF_EHYPE $ 317
        RF_PPTERM $ 318
        F_RF_PDIAB $ 319
        F_RF_GDIAB $ 320
        F_RF_PHYPER $ 321
        F_RF_GHYPER $ 322
        F_RF_ECLAMP $ 323
        F_RF_PPB $ 324
        RF_INFTR $ 325
        RF_FEDRG $ 326
        RF_ARTEC $ 327
        f_RF_INFT $ 328
        F_RF_INF_DRG $ 329
        F_RF_INF_ART $ 330
        RF_CESAR $ 331
        RF_CESARN $ 332-333
        FILLER32 $ 334
        F_RF_CESAR $ 335
        F_RF_NCESAR $ 336
        NO_RISKS $ 337
        FILLER33 $ 338-342
        IP_GON $ 343
        IP_SYPH $ 344
        IP_CHLAM $ 345
        IP_HEPB $ 346
        IP_HEPC $ 347
        F_IP_GONOR $ 348
        F_IP_SYPH $ 349
        F_IP_CHLAM $ 350
        F_IP_HEPATB $ 351
        F_IP_HEPATC $ 352
        NO_INFEC $ 353
        FILLER34 $ 354-358
        FILLER35 $ 359
        OB_ECVS $ 360
        OB_ECVF $ 361
        FILLER36 $ 362
        F_OB_SUCC $ 363
        F_OB_FAIL $ 364
        FILLER37 $ 365-382
        LD_INDL $ 383
        LD_AUGM $ 384
        LD_STER $ 385
        LD_ANTB $ 386
        LD_CHOR $ 387
        LD_ANES $ 388
        F_LD_INDL $ 389
        F_LD_AUGM $ 390
        F_LD_STER $ 391
        F_LD_ANTB $ 392
        F_LD_CHOR $ 393
        F_LD_ANES $ 394
        NO_LBRDLV $ 395
        FILLER38 $ 396-400
        ME_PRES $ 401
        ME_ROUT $ 402
        ME_TRIAL $ 403
        F_ME_PRES $ 404
        F_ME_ROUT $ 405
        F_ME_TRIAL $ 406
        RDMETH_REC $ 407
        DMETH_REC $ 408
        F_DMETH_REC $ 409
        FILLER39 $ 410-414
        MM_MTR $ 415
        MM_PLAC $ 416
        MM_RUPT $ 417
        MM_UHYST $ 418
        MM_AICU $ 419
        FILLER40 $ 420
        F_MM_MTR $ 421
        F_MM_PLAC $ 422
        F_MM_RUPT $ 423
        F_MM_UHYST $ 424
        F_MM_AICU $ 425
        FILLER41 $ 426
        NO_MMORB $ 427
        FILLER42 $ 428-432
        ATTEND $ 433
        MTRAN $ 434
        PAY $ 435
        PAY_REC $ 436
        F_PAY $ 437
        F_PAY_REC $ 438
        FILLER43 $ 439-443
        APGAR5 $ 444-445
        APGAR5R $ 446
        F_APGAR5 $ 447
        APGAR10 $ 448-449
        ;
run;

/* Define the file path */ %let filepath = 'C:\temp\Nat2023us\Nat2023PublicUS.c20240509.r20240724_part01.txt'; /* Read the data */ data natality2023; infile "&filepath" truncover; input FILLER1 $ 1-8 DOB_YY $ 9-12 DOB_MM $ 13-14 FILLER2 $ 15-18 DOB_TT $ 19-22 DOB_WK $ 23 FILLER3 $ 24-31 BFACIL $ 32 F_BFACIL $ 33 FILLER4 $ 34-49 BFACIL3 $ 50 FILLER5 $ 51-72 MAGE_IMPFLG $ 73 MAGE_REPFLG $ 74 MAGER 75-76 MAGER14 $ 77-78 MAGER9 $ 79 FILLER6 $ 80-83 MBSTATE_REC $ 84 FILLER7 $ 85-103 RESTATUS $ 104 MRACE31 $ 105-106 MRACE6 $ 107 MRACE15 $ 108-109 FILLER8 $ 110 MRACEIMP $ 111 MHISPX $ 112 FILLER9 $ 113-114 MHISP_R $ 115 F_MHISP $ 116 MRACEHISP $ 117 FILLER10 $ 118 MAR_P $ 119 DMAR $ 120 MAR_IMP $ 121 FILLER11 $ 122 F_MAR_P $ 123 MEDUC $ 124 FILLER12 $ 125 F_MEDUC $ 126 FILLER13 $ 127-141 **bleep**ERPT_FLG $ 142 FILLER14 $ 143-146 **bleep**ECOMB $ 147-148 **bleep**EREC11 $ 149-150 FRACE31 $ 151-152 FRACE6 $ 153 FRACE15 $ 154-155 FILLER15 $ 156-158 FHISPX $ 159 FHISP_R $ 160 F_FHISP $ 161 FRACEHISP $ 162 FEDUC $ 163 FILLER16 $ 164 f_FEDUC $ 165 FILLER17 $ 166-170 PRIORLIVE $ 171-172 PRIORDEAD $ 173-174 PRIORTERM $ 175-176 FILLER18 $ 177-178 LBO_REC $ 179 FILLER19 $ 180-181 TBO_REC $ 182 FILLER20 $ 183-197 ILLB_R $ 198-200 ILLB_R11 $ 201-202 FILLER21 $ 203-205 ILOP_R $ 206-208 ILOP_R11 $ 209-210 FILLER22 $ 211-213 ILP_R $ 214-216 ILP_R11 $ 217-218 FILLER23 $ 219-223 PRECARE $ 224-225 F_MPCB $ 226 PRECARE5 $ 227 FILLER24 $ 228-237 PREVIS $ 238-239 FILLER25 $ 240-241 PREVIS_REC $ 242-243 F_TPCV $ 244 FILLER26 $ 245-250 WIC $ 251 F_WIC $ 252 CIG_0 $ 253-254 CIG_1 $ 255-256 CIG_2 $ 257-258 CIG_3 $ 259-260 CIG0_R $ 261 CIG1_R $ 262 CIG2_R $ 263 CIG3_R $ 264 F_CIGS_0 $ 265 F_CIGS_1 $ 266 F_CIGS_2 $ 267 F_CIGS_3 $ 268 CIG_REC $ 269 F_TOBACO $ 270 FILLER27 $ 271-279 M_Ht_In $ 280-281 F_M_HT $ 282 BMI $ 283-286 BMI_R $ 287 FILLER28 $ 288-291 PWgt_R $ 292-294 F_PWGT $ 295 FILLER29 $ 296-298 DWgt_R $ 299-301 FILLER30 $ 302 F_DWGT $ 303 WTGAIN $ 304-305 WTGAIN_REC $ 306 F_WTGAIN $ 307 FILLER31 $ 308-312 RF_PDIAB $ 313 RF_GDIAB $ 314 RF_PHYPE $ 315 RF_GHYPE $ 316 RF_EHYPE $ 317 RF_PPTERM $ 318 F_RF_PDIAB $ 319 F_RF_GDIAB $ 320 F_RF_PHYPER $ 321 F_RF_GHYPER $ 322 F_RF_ECLAMP $ 323 F_RF_PPB $ 324 RF_INFTR $ 325 RF_FEDRG $ 326 RF_ARTEC $ 327 f_RF_INFT $ 328 F_RF_INF_DRG $ 329 F_RF_INF_ART $ 330 RF_CESAR $ 331 RF_CESARN $ 332-333 FILLER32 $ 334 F_RF_CESAR $ 335 F_RF_NCESAR $ 336 NO_RISKS $ 337 FILLER33 $ 338-342 IP_GON $ 343 IP_SYPH $ 344 IP_CHLAM $ 345 IP_HEPB $ 346 IP_HEPC $ 347 F_IP_GONOR $ 348 F_IP_SYPH $ 349 F_IP_CHLAM $ 350 F_IP_HEPATB $ 351 F_IP_HEPATC $ 352 NO_INFEC $ 353 FILLER34 $ 354-358 FILLER35 $ 359 OB_ECVS $ 360 OB_ECVF $ 361 FILLER36 $ 362 F_OB_SUCC $ 363 F_OB_FAIL $ 364 FILLER37 $ 365-382 LD_INDL $ 383 LD_AUGM $ 384 LD_STER $ 385 LD_ANTB $ 386 LD_CHOR $ 387 LD_ANES $ 388 F_LD_INDL $ 389 F_LD_AUGM $ 390 F_LD_STER $ 391 F_LD_ANTB $ 392 F_LD_CHOR $ 393 F_LD_ANES $ 394 NO_LBRDLV $ 395 FILLER38 $ 396-400 ME_PRES $ 401 ME_ROUT $ 402 ME_TRIAL $ 403 F_ME_PRES $ 404 F_ME_ROUT $ 405 F_ME_TRIAL $ 406 RDMETH_REC $ 407 DMETH_REC $ 408 F_DMETH_REC $ 409 FILLER39 $ 410-414 MM_MTR $ 415 MM_PLAC $ 416 MM_RUPT $ 417 MM_UHYST $ 418 MM_AICU $ 419 FILLER40 $ 420 F_MM_MTR $ 421 F_MM_PLAC $ 422 F_MM_RUPT $ 423 F_MM_UHYST $ 424 F_MM_AICU $ 425 FILLER41 $ 426 NO_MMORB $ 427 FILLER42 $ 428-432 ATTEND $ 433 MTRAN $ 434 PAY $ 435 PAY_REC $ 436 F_PAY $ 437 F_PAY_REC $ 438 FILLER43 $ 439-443 APGAR5 $ 444-445 APGAR5R $ 446 F_APGAR5 $ 447 APGAR10 $ 448-449 ; run;

And of course because the source data is positional it's easy to only read the columns you actually need into SAS which hopefully will significantly reduce the effort.

Tom · Posted 10-16-2024 11:39 PM

First thing is you cannot "import" that type of file. The data is arranged in fixed positions on each line. You have to actually write a data step to actual READ in the data from the file instead.

Please show the data step you wrote to read it and explain in what ways it did not work.

Second thing is that is a ZIP file and the ZIP engine in SAS 9.4m7 seems to have trouble with expanding it.

1    data _null_;
2      infile "C:\Downloads\Nat2023us.zip" zip member='*'
3        termstr=crlf obs=2;
4      input;
5      list;
6    run;

ERROR: Open failure for C:\Downloads\Nat2023us.zip during attempt to create a local file handle.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

So you will probably need to first unzip the file before SAS can read it.

Spoiler

55   data _null_;
56     infile "%sysfunc(pathname(work))/Nat2023PublicUS.c20240509.r20240724.txt"
57       termstr=crlf obs=2;
58     input;
59     list;
60   run;

NOTE: The infile "/Volumes/scratch/SAS_workxxx/Nat2023PublicUS.c20240509.r20240724.txt" is:
      Filename=/Volumes/scratch/SAS_workxxx/Nat2023PublicUS.c20240509.r20240724.txt,
      Owner Name=xxx,Group Name=xxx,
      Access Permission=-rw-rw-r--,
      Last Modified=24Jul2024:12:18:02,
      File Size (bytes)=4856044107

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1                 202301    01171        11                1                        37116    1
     101     103303  0  013 Y2  17 1                    330501101   00115 1     010001  2  3               028
     201  04   20208   02804     0311          12  071      N10000000000001111N1         64124.02    1401   15
     301  6 11621     NNNNNN111111NXX111N00 111     NNNNN111111      NN 11                  NNNNNN1111111
     401  11X111111     NNNNN 11111 1     1N1111     0941885   1    9               F 04  2022     39072    39
     501  0723625 083     NNNNNN 111111 1     NNNNNN111111NNNNNN1111111     NYY1
     601
     701
     801
     901
    1001
    1101
    1201
    1301                                                1345
2                 202301    08594        11                1                        29094    2
     101     105514  0  015 X1  15 1                    310505514   00155 1     010001  2  3               077
     201  08   01302   01302     0311          07  051      Y10000000000001111N1         57132.74    1511   17
     301  6 12531     NNNNNN111111NXX111N00 111     NNNNN111111      NN 11                  NNNNNY1111110
     401  11X111111     NNNNN 11111 1     1N5411     0941885   1    9               M 04  2022     39072    39
     501  0723284 073     NNNNNN 111111 1     NNNNNN111111NNNNNN1111111     NYY1
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
     601
     701
     801
     901
    1001
    1101
    1201
    1301                                                1345
NOTE: 2 records were read from the infile "/Volumes/scratch/SAS_work6D6F00009B54_amraelp00008222.pfizer.com/Nat2023PublicUS.c20240509.r20240724.txt".
      The minimum record length was 1345.
      The maximum record length was 1345.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

55 data _null_; 56 infile "%sysfunc(pathname(work))/Nat2023PublicUS.c20240509.r20240724.txt" 57 termstr=crlf obs=2; 58 input; 59 list; 60 run; NOTE: The infile "/Volumes/scratch/SAS_workxxx/Nat2023PublicUS.c20240509.r20240724.txt" is: Filename=/Volumes/scratch/SAS_workxxx/Nat2023PublicUS.c20240509.r20240724.txt, Owner Name=xxx,Group Name=xxx, Access Permission=-rw-rw-r--, Last Modified=24Jul2024:12:18:02, File Size (bytes)=4856044107 RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 1 202301 01171 11 1 37116 1 101 103303 0 013 Y2 17 1 330501101 00115 1 010001 2 3 028 201 04 20208 02804 0311 12 071 N10000000000001111N1 64124.02 1401 15 301 6 11621 NNNNNN111111NXX111N00 111 NNNNN111111 NN 11 NNNNNN1111111 401 11X111111 NNNNN 11111 1 1N1111 0941885 1 9 F 04 2022 39072 39 501 0723625 083 NNNNNN 111111 1 NNNNNN111111NNNNNN1111111 NYY1 601 701 801 901 1001 1101 1201 1301 1345 2 202301 08594 11 1 29094 2 101 105514 0 015 X1 15 1 310505514 00155 1 010001 2 3 077 201 08 01302 01302 0311 07 051 Y10000000000001111N1 57132.74 1511 17 301 6 12531 NNNNNN111111NXX111N00 111 NNNNN111111 NN 11 NNNNNY1111110 401 11X111111 NNNNN 11111 1 1N5411 0941885 1 9 M 04 2022 39072 39 501 0723284 073 NNNNNN 111111 1 NNNNNN111111NNNNNN1111111 NYY1 RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 601 701 801 901 1001 1101 1201 1301 1345 NOTE: 2 records were read from the infile "/Volumes/scratch/SAS_work6D6F00009B54_amraelp00008222.pfizer.com/Nat2023PublicUS.c20240509.r20240724.txt". The minimum record length was 1345. The maximum record length was 1345. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds

abhi309 · Posted 10-17-2024 01:39 PM

Thank you everyone for your help. I was looking for another way other than data step to import this file since there are around 150 variables but I don't think there is any other way. So I will go forward and use the data step.

Thanks again

SAS Programming

Please help me import this txt file in SAS dataset

Re: Please help me import this txt file in SAS dataset

Re: Please help me import this txt file in SAS dataset

Re: Please help me import this txt file in SAS dataset

Re: Please help me import this txt file in SAS dataset

Follow Us

What is...

SAS Programming

Please help me import this txt file in SAS dataset

Re: Please help me import this txt file in SAS dataset

Re: Please help me import this txt file in SAS dataset

Re: Please help me import this txt file in SAS dataset

Re: Please help me import this txt file in SAS dataset

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...