Hi all,
I am looking for help to import report format raw data (see text file attached). The challenges I have for this file are as follows:
1) It is in report format
2) It contained multiple reports in one file
3) Each report has different field layouts
4) Each report does not have fixed lines, i.e. the fields are not in the same fixed position for the same report.
I attached the sample raw text file, the sample output in Excel as well as the markup in pdf (to show how I get the values from the text file)
Thank you in advance for all the responses.
If you truly do not have any patterns you will have a great deal of problems.
A data step and Input statements will let your read this but I'm not going to any actual code without knowing which pieces of data you want in the output.
Likely coming into play will be a number of very old school programming elements like _infile_ and the trailing @ for reading.
You will have to parse the lines of code to tell which part of the report you are on.
Here is a PARTIAL program that will read SOME data fields and write one form of output. This is not complete by any means as I don't know which values may need to be numeric. Your "dates" if they are indeed supposed to be in year 2054 could be read as date values with proper informat/formats.
Note that you will want to read up on INPUT statement options. I have heavily used the @"value" which searches a string for that value and then starts reading.
You will need to find the things that delimit which type of "report" a block of data represents and likely write a set of code for each. It suspect it might be advantageous to write to different output data sets but not knowing what you need I'm not going much further. BTW, this is type of thing that I would bill about $250 per hour to code a complete solution as an independent contractor, and possibly include a minimum number of hours as well.
data example; infile "<path>\sample.txt" lrecl=500; informat fromdate todate $10. point $10. applid $10. SETTDATE pardate 7. CompanyName $15. CompanyId $10. sec $4. ENTRYDESC $15. EFFdate $10. DDADRamount SAVdramount USERdramount TOTALSdramount Comma15. DDACRamount SAVCRamount USERCRamount TOTALSCRamount comma15. ; retain fromdate todate point applid SETTDATE pardate CompanyName companyId sec entrydesc effdate ; input @; if index(_infile_,'R E T U R N R E S O L U T I O N R E P O R T (FOR RETIRES)')>0 then put "Reading title line:" _infile_; if index(_infile_,'FROM DATE/WINDOW')>0 then input @'FROM DATE/WINDOW' FromDate; if index(_infile_,'TO DATE/WINDOW')>0 then input @'TO DATE/WINDOW' ToDate; if index(_infile_,'COLL APPL INFO:') >0 then do; input @"POINT: " point @"APPL ID:" Applid @"SETT DATE: " settdate @"PAR DATE:" pardate / /*this specific slash is Next line*/ @ "COMPANY NAME:" COMPANYName @ "COMPANY ID:" companyid @"SEC:" SEC @ "ENTRY DESC:" EntryDesc @ "EFF DATE:" EFFDATE; input; input; /* blank line, dashline*/ ; end; /*end the block of the COLL APPL INFO*/ if index (_infile_,'DR COUNT')>0 then do; input @ 'DR COUNT' DDADRcount SAVdrCount USERdrcount TOTALSdrcount / @ 'DR AMOUNT' DDADRamount SAVdramount USERdramount TOTALSdramount / / @ 'CR COUNT' DDACRcount SAVCRCount USERCRcount TOTALSCRcount / @ 'CR AMOUNT' DDACRamount SAVCRamount USERCRamount TOTALSCRamount ; output; end; run;
If you truly do not have any patterns you will have a great deal of problems.
A data step and Input statements will let your read this but I'm not going to any actual code without knowing which pieces of data you want in the output.
Likely coming into play will be a number of very old school programming elements like _infile_ and the trailing @ for reading.
You will have to parse the lines of code to tell which part of the report you are on.
Here is a PARTIAL program that will read SOME data fields and write one form of output. This is not complete by any means as I don't know which values may need to be numeric. Your "dates" if they are indeed supposed to be in year 2054 could be read as date values with proper informat/formats.
Note that you will want to read up on INPUT statement options. I have heavily used the @"value" which searches a string for that value and then starts reading.
You will need to find the things that delimit which type of "report" a block of data represents and likely write a set of code for each. It suspect it might be advantageous to write to different output data sets but not knowing what you need I'm not going much further. BTW, this is type of thing that I would bill about $250 per hour to code a complete solution as an independent contractor, and possibly include a minimum number of hours as well.
data example; infile "<path>\sample.txt" lrecl=500; informat fromdate todate $10. point $10. applid $10. SETTDATE pardate 7. CompanyName $15. CompanyId $10. sec $4. ENTRYDESC $15. EFFdate $10. DDADRamount SAVdramount USERdramount TOTALSdramount Comma15. DDACRamount SAVCRamount USERCRamount TOTALSCRamount comma15. ; retain fromdate todate point applid SETTDATE pardate CompanyName companyId sec entrydesc effdate ; input @; if index(_infile_,'R E T U R N R E S O L U T I O N R E P O R T (FOR RETIRES)')>0 then put "Reading title line:" _infile_; if index(_infile_,'FROM DATE/WINDOW')>0 then input @'FROM DATE/WINDOW' FromDate; if index(_infile_,'TO DATE/WINDOW')>0 then input @'TO DATE/WINDOW' ToDate; if index(_infile_,'COLL APPL INFO:') >0 then do; input @"POINT: " point @"APPL ID:" Applid @"SETT DATE: " settdate @"PAR DATE:" pardate / /*this specific slash is Next line*/ @ "COMPANY NAME:" COMPANYName @ "COMPANY ID:" companyid @"SEC:" SEC @ "ENTRY DESC:" EntryDesc @ "EFF DATE:" EFFDATE; input; input; /* blank line, dashline*/ ; end; /*end the block of the COLL APPL INFO*/ if index (_infile_,'DR COUNT')>0 then do; input @ 'DR COUNT' DDADRcount SAVdrCount USERdrcount TOTALSdrcount / @ 'DR AMOUNT' DDADRamount SAVdramount USERdramount TOTALSdramount / / @ 'CR COUNT' DDACRcount SAVCRCount USERCRcount TOTALSCRcount / @ 'CR AMOUNT' DDACRamount SAVCRamount USERCRamount TOTALSCRamount ; output; end; run;
Are the three files supposed to be three different possible input files and you want to know which one is easier to read and convert into a reasonable SAS dataset?
Or are the second two supposed to be your attempt to convert the first file into data?
In general it is not difficult to read a reasonably well structure report file like the first file.
What you need to do is figure out which values on the page you actually need and what are the clues that tell you where in the report you are. So in general you read the information that appears once in the header lines and RETAIN those values so that when you read a detailed lines the values from the header lines are populated already.
Click on the SPOILER tag to see the text of the first example file as a text box.
R3030-02/R1.0 A454 PEP+ R E T U R N R E S O L U T I O N R E P O R T (FOR RETIRES) 11-30-54 05:34 PAGE 1 FROM DATE/WINDOW 11-30-54 0530 TO DATE/WINDOW 11-30-54 0530 PROCESS DATE 11-30-54 ------------------------------------------------------------------------------------------------------------------------------------ COLL APPL INFO: POINT: A454TTMTG APPL ID: 444444444 TYPE: SETT DATE: 000 PAR DATE: 54334 BATCH HEADER INFO: COMPANY NAME: BAN COMPANY ID: C351524144 SEC: PPD ENTRY DESC: MTGMANAGER EFF DATE: 541124 ------------------------------------------------------------------------------------------------------------------------------------ RETURN ORIG ITEM CUSTOMER ID/ <=========== DISTRIBUTION ===========> PAR PAR CUSTOMER NAME TC AMOUNT T/R BANK ACCOUNT NO POINT APPL TYPE T/R BR ------------------------------------------------------------------------------------------------------------------------------------ ADDENDA ORIGINAL EVENT TYPE RETURN REASON RDFI T/R ORIGINAL TRACE ADDENDA INFORMATION RETURN TRACE DATE ------------------------------------------------------------------------------------------------------------------------------------ 004451204 54333012343351 444441544400 21 0.00 044300012 444123410400 A454YYYET RETIRE 044300012 PAYMENT CLEARING ** RESOLVED BY PEP+ ** 44 R14-AMOUNT FIELD ERR 04430001 044300010000004 044300011141204 DDA SAV USER TOTALS DR COUNT 0 0 0 0 DR AMOUNT 0.00 0.00 0.00 0.00 CR COUNT 1 0 0 1 CR AMOUNT 0.00 0.00 0.00 0.00 R3030-02/R1.0 A454 PEP+ R E T U R N R E S O L U T I O N R E P O R T (FOR RETIRES) 11-30-54 05:34 PAGE 2 FROM DATE/WINDOW 11-30-54 0530 TO DATE/WINDOW 11-30-54 0530 PROCESS DATE 11-30-54 ------------------------------------------------------------------------------------------------------------------------------------ COLL APPL INFO: POINT: A454BMMTG APPL ID: 444444444 TYPE: SETT DATE: 334 PAR DATE: 54334 BATCH HEADER INFO: COMPANY NAME: BAN COMPANY ID: C351454144 SEC: WEB ENTRY DESC: MTGMANAGER EFF DATE: 541125 ------------------------------------------------------------------------------------------------------------------------------------ RETURN ORIG ITEM CUSTOMER ID/ <=========== DISTRIBUTION ===========> PAR PAR CUSTOMER NAME TC AMOUNT T/R BANK ACCOUNT NO POINT APPL TYPE T/R BR ------------------------------------------------------------------------------------------------------------------------------------ ADDENDA ORIGINAL EVENT TYPE RETURN REASON RDFI T/R ORIGINAL TRACE ADDENDA INFORMATION RETURN TRACE DATE ------------------------------------------------------------------------------------------------------------------------------------ 013555445 54332004551111 0000414504 24 3,105.54 044300012 0423344534 A454BKRET RETIRE 044300012 KALAYYYN BILALLO RET REASON NOT DEFINED ON CA 44 R01-INSUFFICIENT FUNDS 25105040 044300014553544 251050405445444 DDA SAV USER TOTALS DR COUNT 1 0 0 1 DR AMOUNT 3,105.54 0.00 0.00 3,105.54 CR COUNT 0 0 0 0 CR AMOUNT 0.00 0.00 0.00 0.00 R3030-02/R1.0 A454 PEP+ R E T U R N R E S O L U T I O N R E P O R T (FOR RETIRES) 11-30-54 05:34 PAGE 3 FROM DATE/WINDOW 11-30-54 0530 TO DATE/WINDOW 11-30-54 0530 PROCESS DATE 11-30-54 ------------------------------------------------------------------------------------------------------------------------------------ T O T A L F O R P O I N T A454BKMTG ------------------------------------------------------------------------------------------------------------------------------------ RETURN ORIG ITEM CUSTOMER ID/ <=========== DISTRIBUTION ===========> PAR PAR CUSTOMER NAME TC AMOUNT T/R BANK ACCOUNT NO POINT APPL TYPE T/R BR ------------------------------------------------------------------------------------------------------------------------------------ ADDENDA ORIGINAL EVENT TYPE RETURN REASON RDFI T/R ORIGINAL TRACE ADDENDA INFORMATION RETURN TRACE DATE ------------------------------------------------------------------------------------------------------------------------------------ DDA SAV USER TOTALS DR COUNT 1 0 0 1 DR AMOUNT 3,105.54 0.00 0.00 3,105.54 CR COUNT 1 0 0 1 CR AMOUNT 0.00 0.00 0.00 0.00 R3030-02/R1.0 A454 PEP+ R E T U R N R E S O L U T I O N R E P O R T (FOR RETIRES) 11-30-54 05:34 PAGE 4 FROM DATE/WINDOW 11-30-54 0530 TO DATE/WINDOW 11-30-54 0530 PROCESS DATE 11-30-54 ------------------------------------------------------------------------------------------------------------------------------------ COLL APPL INFO: POINT: A454CC APPL ID: AAAEQU01 TYPE: SETT DATE: 334 PAR DATE: 54334 BATCH HEADER INFO: COMPANY NAME: CCA Equipment COMPANY ID: 1352542554 SEC: CCD ENTRY DESC: Focus2truc EFF DATE: 541124 ------------------------------------------------------------------------------------------------------------------------------------ RETURN ORIG ITEM CUSTOMER ID/ <=========== DISTRIBUTION ===========> PAR PAR CUSTOMER NAME TC AMOUNT T/R BANK ACCOUNT NO POINT APPL TYPE T/R BR ------------------------------------------------------------------------------------------------------------------------------------ ADDENDA ORIGINAL EVENT TYPE RETURN REASON RDFI T/R ORIGINAL TRACE ADDENDA INFORMATION RETURN TRACE DATE ------------------------------------------------------------------------------------------------------------------------------------ 001543045 54324004454545 Focus 4552 2110 24 1,550.44 044300012 443443342 A454BKRET RETIRE 044300012 Focus 4552 2110 *** RETURNED TOO LATE *** 44 R01-INSUFFICIENT FUNDS 05100001 044300014454545 INSUFFICIENT FUNDS 021000024554404 DDA SAV USER TOTALS DR COUNT 1 0 0 1 DR AMOUNT 1,550.44 0.00 0.00 1,550.44 CR COUNT 0 0 0 0 CR AMOUNT 0.00 0.00 0.00 0.00 R3140-02/R4.0 A454 PEP+ D I S T R I B U T I O N E N T R I E S L I S T (ACH FORMAT) 11-30-54 05:44 PAGE 24,214 POINT: A454DDA APPL: ENTRIES TYPE: T/R: 044300012 FROM DATE/WINDOW 11-30-2054 0530 TO DATE/WINDOW 11-30-2054 0530 ------------------------------------------------------------------------------------------------------------------------------------ REC SERVICE COMPANY STD ENTRY CO. ENTRY CO. DESC EFF ENTRY ORIG BATCH TYPE CLASS CODE COMPANY NAME DISCRETIONARY DATA ID CLASS DESC DATE DATE BANK NUMBER ------------------------------------------------------------------------------------------------------------------------------------ 5 200 VCS Lendneo 14032540 14032540 CCD VCS_SSTL 541124 541130 04401443 0025044 ------------------------------------------------------------------------------------------------------------------------------------ REC TRAN/ABA CUSTOMER CUSTOMER DISC ADDENDA TRACE TYPE TC NUMBER BANK ACCT NO. AMOUNT ID NAME DATA IND PAR NUMBER ------------------------------------------------------------------------------------------------------------------------------------ 4 54 04430001-2 10002130 1,533.45 2144214 Client Services 0 54333010544532 0544532 ------------------------------------------------------------------------------------------------------------------------------------ REC SERVICE ENTRY ENTRY TOTAL DR TOTAL CR COMPANY ORIG BATCH TYPE CLASS CODE COUNT HASH DOLLAR AMT DOLLAR AMT ID BANK NUMBER ------------------------------------------------------------------------------------------------------------------------------------ 4 200 000001 0004430001 .00 1,533.45 14032540 04401443 0025044 R3140-02/R4.0 A454 PEP+ D I S T R I B U T I O N E N T R I E S L I S T (ACH FORMAT) 11-30-54 05:44 PAGE 24,540 POINT: A454DDA APPL: ENTRIES TYPE: T/R: 044300012 FROM DATE/WINDOW 11-30-2054 0530 TO DATE/WINDOW 11-30-2054 0530 ------------------------------------------------------------------------------------------------------------------------------------ REC SERVICE COMPANY STD ENTRY CO. ENTRY CO. DESC EFF ENTRY ORIG BATCH TYPE CLASS CODE COMPANY NAME DISCRETIONARY DATA ID CLASS DESC DATE DATE BANK NUMBER ------------------------------------------------------------------------------------------------------------------------------------ 5 200 VVD FirstLean 14035424 14035424 CCD VCS_SSTL 541124 541130 04401443 0025045 ------------------------------------------------------------------------------------------------------------------------------------ REC TRAN/ABA CUSTOMER CUSTOMER DISC ADDENDA TRACE TYPE TC NUMBER BANK ACCT NO. AMOUNT ID NAME DATA IND PAR NUMBER ------------------------------------------------------------------------------------------------------------------------------------ 4 54 04430001-2 10002130 124.54 2145455 Client Services 0 54333010544534 0544534 ------------------------------------------------------------------------------------------------------------------------------------ REC SERVICE ENTRY ENTRY TOTAL DR TOTAL CR COMPANY ORIG BATCH TYPE CLASS CODE COUNT HASH DOLLAR AMT DOLLAR AMT ID BANK NUMBER ------------------------------------------------------------------------------------------------------------------------------------ 4 200 000001 0004430001 .00 124.54 14035424 04401443 0025045 R3020-02/R4.0 A454 PEP+ C O L L E C T I O N E N T R Y R E P O R T 11-30-54 05:35 PAGE 1,021 DETAIL LIST COLLECTION DATE/WINDOW: 11-30-2054 0530 ------------------------------------------------------------------------------------------------------------------------------------ POINT: A454FED APPL: 444444444 TYPE: PAR DATE: 54334 FILE/BATCH REF NO: 54334 000021/0000442 IMMED DEST: 443340504 FISERV CREATED: 11/24/54 54:52 FORMAT: 1 IMMED ORIG: 441000134 ASF APPLICATION SUPERVI FILE ID MOD: G REF CODE: ------------------------------------------------------------------------------------------------------------------------------------ CO NAME: Worldpay ID: 1244535455 ENT DESC: WIC ODFI T/R: 0115-0012 BATCH: 0000442 EFF DATE: 11/30/54 ORIG EFF DATE: 11/30/54 DESC DATE: 112454 DISCRETIONARY DATA: SUPERMERCADO EL GUER SEC: CCD SRVC CLS: 200 PAR: 001534354 ORIG STAT: NACHA SETTLEMENT DATE: 11/30/54 (334) ------------------------------------------------------------------------------------------------------------------------------------ *------- D I S T R I B U T I O N ------* PAR NO TC T/R ACCOUNT AMOUNT INDIVIDUAL NAME/ID DS TRACE POINT APPL TYPE T/R BRANCH DATE WIND ------------------------------------------------------------------------------------------------------------------------------------ 001534354 54-CDN 044300012 4424135004 5.14 SUPERMERCADO EL GUERO 3441334 A454DDA ENTRIES 044300012 542424404540510 11/30 0530 3023-544 INFO FOR CPO REASON: CPO00 OEM FI SWAP 3023-42 *** TRANSIT/ROUTING SWAPPED FROM - 051401404 *** TYPE: 05 FREE FORM: Worldpay WIC DEP 542424404540510 SUPERMERCADO EL GUERO BC ADD SEQ: 0001 ------------------------------------------------------------------------------------------------------------------------------------ B A T C H T O T A L S BATCH TRAILER: HASH TOTAL: 5140140 DR AMOUNT: .00 CR AMOUNT: 5.14 SERVICE CLASS CD: 200 ENT/ADD COUNT: 2 NET AMOUNT: 5.14 ------------------------------------------------------------------------------------------------------------------------------------ *- - - - - - V A L I D / W A R N I N G S - - - - - -* *- - - - - - R E J E C T S - - - - - -* *- - - - - - T O T A L - - - - -* COUNT AMOUNT COUNT PCT AMOUNT PCT COUNT AMOUNT ------------------------------------------------------------------------------------------------------------------------------------ DDA CR 1 5.14 .00 1 5.14 *COLLECTED CR 1 5.14 .00 1 5.14 *INPUT CR 1 5.14 .00 1 5.14 ADDENDA 1 .00 .00 1 .00 WARNINGS 2 5.14 .00 2 5.14
text file is the actual raw data I want to import.
Excel file is the output I am looking for
pdf file is just a markup on where I get the values from.
And assuming these are fake files and not work files because I'm guessing some information in there shouldn't be public..
It is easier to share images by just pasting them into the forum instead of attachments. Many users cannot download attachments.
It would help if the image in your PDF did not wrap the lines of text. Perhaps if you changed the font size used or switched to landscape orientation you could make a picture without the line wrapping.
One thing of importance you did not highlight is the FormFeed character that appears at the start of each extra page. '0C'x is the ASCII code for the formfeed character.
Another thing to watch out for is what happens when there is more information than fits on one page? you might want to look at some of the larger example files and see if that happens. Does the report repeat all of the header information again? Does it include something like a (CONTINUED) line to indicate that it needs a second page?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.