Hi all,
I am looking for help to import report format raw data (see text file attached). The challenges I have for this file are as follows:
1) It is in report format
2) It contained multiple reports in one file
3) Each report has different field layouts
4) Each report does not have fixed lines, i.e. the fields are not in the same fixed position for the same report.
I attached the sample raw text file, the sample output in Excel as well as the markup in pdf (to show how I get the values from the text file)
Thank you in advance for all the responses.
If you truly do not have any patterns you will have a great deal of problems.
A data step and Input statements will let your read this but I'm not going to any actual code without knowing which pieces of data you want in the output.
Likely coming into play will be a number of very old school programming elements like _infile_ and the trailing @ for reading.
You will have to parse the lines of code to tell which part of the report you are on.
Here is a PARTIAL program that will read SOME data fields and write one form of output. This is not complete by any means as I don't know which values may need to be numeric. Your "dates" if they are indeed supposed to be in year 2054 could be read as date values with proper informat/formats.
Note that you will want to read up on INPUT statement options. I have heavily used the @"value" which searches a string for that value and then starts reading.
You will need to find the things that delimit which type of "report" a block of data represents and likely write a set of code for each. It suspect it might be advantageous to write to different output data sets but not knowing what you need I'm not going much further. BTW, this is type of thing that I would bill about $250 per hour to code a complete solution as an independent contractor, and possibly include a minimum number of hours as well.
data example;
   infile "<path>\sample.txt" lrecl=500;
   informat fromdate todate $10. point $10. applid $10. SETTDATE pardate 7. 
            CompanyName $15. CompanyId $10. sec $4. ENTRYDESC $15.
            EFFdate $10.
            DDADRamount SAVdramount  USERdramount TOTALSdramount Comma15.
            DDACRamount SAVCRamount  USERCRamount TOTALSCRamount comma15.
   ;
   retain fromdate todate point applid SETTDATE pardate CompanyName
          companyId sec entrydesc   effdate
   ;
   input @;
   if index(_infile_,'R E T U R N   R E S O L U T I O N   R E P O R T (FOR RETIRES)')>0 then put "Reading title line:" _infile_;   
   if index(_infile_,'FROM DATE/WINDOW')>0 then input @'FROM DATE/WINDOW' FromDate;
   if index(_infile_,'TO   DATE/WINDOW')>0 then input @'TO   DATE/WINDOW' ToDate;
   if index(_infile_,'COLL APPL INFO:') >0 then do;
      input @"POINT: " point @"APPL ID:" Applid @"SETT DATE: " settdate
            @"PAR DATE:" pardate
      / /*this specific slash is Next line*/
      @ "COMPANY NAME:" COMPANYName @ "COMPANY ID:" companyid @"SEC:" SEC
      @ "ENTRY DESC:" EntryDesc @ "EFF DATE:" EFFDATE;
      input; input; /* blank line, dashline*/
   ;
   end; /*end the block of the COLL APPL INFO*/
   if index (_infile_,'DR COUNT')>0 then do;
      input @ 'DR COUNT' DDADRcount SAVdrCount  USERdrcount TOTALSdrcount
          / @ 'DR AMOUNT'  DDADRamount SAVdramount  USERdramount TOTALSdramount
          /
          / @ 'CR COUNT' DDACRcount SAVCRCount  USERCRcount TOTALSCRcount
          / @ 'CR AMOUNT'  DDACRamount SAVCRamount  USERCRamount TOTALSCRamount
          ;
          output;
       
   end;
run;
					
				
			
			
				
			
			
			
				
			
			
			
			
			
		If you truly do not have any patterns you will have a great deal of problems.
A data step and Input statements will let your read this but I'm not going to any actual code without knowing which pieces of data you want in the output.
Likely coming into play will be a number of very old school programming elements like _infile_ and the trailing @ for reading.
You will have to parse the lines of code to tell which part of the report you are on.
Here is a PARTIAL program that will read SOME data fields and write one form of output. This is not complete by any means as I don't know which values may need to be numeric. Your "dates" if they are indeed supposed to be in year 2054 could be read as date values with proper informat/formats.
Note that you will want to read up on INPUT statement options. I have heavily used the @"value" which searches a string for that value and then starts reading.
You will need to find the things that delimit which type of "report" a block of data represents and likely write a set of code for each. It suspect it might be advantageous to write to different output data sets but not knowing what you need I'm not going much further. BTW, this is type of thing that I would bill about $250 per hour to code a complete solution as an independent contractor, and possibly include a minimum number of hours as well.
data example;
   infile "<path>\sample.txt" lrecl=500;
   informat fromdate todate $10. point $10. applid $10. SETTDATE pardate 7. 
            CompanyName $15. CompanyId $10. sec $4. ENTRYDESC $15.
            EFFdate $10.
            DDADRamount SAVdramount  USERdramount TOTALSdramount Comma15.
            DDACRamount SAVCRamount  USERCRamount TOTALSCRamount comma15.
   ;
   retain fromdate todate point applid SETTDATE pardate CompanyName
          companyId sec entrydesc   effdate
   ;
   input @;
   if index(_infile_,'R E T U R N   R E S O L U T I O N   R E P O R T (FOR RETIRES)')>0 then put "Reading title line:" _infile_;   
   if index(_infile_,'FROM DATE/WINDOW')>0 then input @'FROM DATE/WINDOW' FromDate;
   if index(_infile_,'TO   DATE/WINDOW')>0 then input @'TO   DATE/WINDOW' ToDate;
   if index(_infile_,'COLL APPL INFO:') >0 then do;
      input @"POINT: " point @"APPL ID:" Applid @"SETT DATE: " settdate
            @"PAR DATE:" pardate
      / /*this specific slash is Next line*/
      @ "COMPANY NAME:" COMPANYName @ "COMPANY ID:" companyid @"SEC:" SEC
      @ "ENTRY DESC:" EntryDesc @ "EFF DATE:" EFFDATE;
      input; input; /* blank line, dashline*/
   ;
   end; /*end the block of the COLL APPL INFO*/
   if index (_infile_,'DR COUNT')>0 then do;
      input @ 'DR COUNT' DDADRcount SAVdrCount  USERdrcount TOTALSdrcount
          / @ 'DR AMOUNT'  DDADRamount SAVdramount  USERdramount TOTALSdramount
          /
          / @ 'CR COUNT' DDACRcount SAVCRCount  USERCRcount TOTALSCRcount
          / @ 'CR AMOUNT'  DDACRamount SAVCRamount  USERCRamount TOTALSCRamount
          ;
          output;
       
   end;
run;
					
				
			
			
				
			
			
			
			
			
			
			
		Are the three files supposed to be three different possible input files and you want to know which one is easier to read and convert into a reasonable SAS dataset?
Or are the second two supposed to be your attempt to convert the first file into data?
In general it is not difficult to read a reasonably well structure report file like the first file.
What you need to do is figure out which values on the page you actually need and what are the clues that tell you where in the report you are. So in general you read the information that appears once in the header lines and RETAIN those values so that when you read a detailed lines the values from the header lines are populated already.
Click on the SPOILER tag to see the text of the first example file as a text box.
R3030-02/R1.0 A454   PEP+         R E T U R N   R E S O L U T I O N   R E P O R T (FOR RETIRES)           11-30-54 05:34 PAGE      1   
                                                                                                     FROM DATE/WINDOW  11-30-54 0530  
                                                                                                     TO   DATE/WINDOW  11-30-54 0530   
                                                                                                     PROCESS DATE      11-30-54        
------------------------------------------------------------------------------------------------------------------------------------  
COLL APPL INFO:            POINT: A454TTMTG            APPL ID: 444444444  TYPE:        SETT DATE: 000          PAR DATE: 54334        
BATCH HEADER INFO:  COMPANY NAME: BAN               COMPANY ID: C351524144  SEC: PPD   ENTRY DESC: MTGMANAGER   EFF DATE: 541124       
------------------------------------------------------------------------------------------------------------------------------------  
   RETURN   ORIG ITEM    CUSTOMER ID/                                                         <=========== DISTRIBUTION ===========>   
    PAR        PAR     CUSTOMER NAME          TC        AMOUNT    T/R    BANK ACCOUNT NO      POINT     APPL    TYPE     T/R     BR    
------------------------------------------------------------------------------------------------------------------------------------   
 ADDENDA                        ORIGINAL                                                                                     EVENT     
  TYPE    RETURN REASON         RDFI T/R   ORIGINAL TRACE   ADDENDA INFORMATION                             RETURN TRACE     DATE      
------------------------------------------------------------------------------------------------------------------------------------   
004451204 54333012343351      444441544400    21          0.00 044300012 444123410400      A454YYYET RETIRE           044300012       
                       PAYMENT CLEARING                                                               ** RESOLVED BY PEP+ **           
  44 R14-AMOUNT FIELD ERR       04430001   044300010000004                                                   044300011141204           
                                 DDA                 SAV                USER              TOTALS                                    
         DR COUNT                  0                   0                   0                     0                                    
         DR AMOUNT              0.00                0.00                0.00                  0.00                                     
         CR COUNT                  1                   0                   0                     1                                    
         CR AMOUNT              0.00                0.00                0.00                  0.00                                     
R3030-02/R1.0 A454   PEP+         R E T U R N   R E S O L U T I O N   R E P O R T (FOR RETIRES)           11-30-54 05:34 PAGE      2  
                                                                                                     FROM DATE/WINDOW  11-30-54 0530  
                                                                                                     TO   DATE/WINDOW  11-30-54 0530   
                                                                                                     PROCESS DATE      11-30-54        
------------------------------------------------------------------------------------------------------------------------------------  
COLL APPL INFO:            POINT: A454BMMTG            APPL ID: 444444444  TYPE:        SETT DATE: 334          PAR DATE: 54334        
BATCH HEADER INFO:  COMPANY NAME: BAN               COMPANY ID: C351454144  SEC: WEB   ENTRY DESC: MTGMANAGER   EFF DATE: 541125       
------------------------------------------------------------------------------------------------------------------------------------  
   RETURN   ORIG ITEM    CUSTOMER ID/                                                         <=========== DISTRIBUTION ===========>   
    PAR        PAR     CUSTOMER NAME          TC        AMOUNT    T/R    BANK ACCOUNT NO      POINT     APPL    TYPE     T/R     BR    
------------------------------------------------------------------------------------------------------------------------------------   
 ADDENDA                        ORIGINAL                                                                                     EVENT     
  TYPE    RETURN REASON         RDFI T/R   ORIGINAL TRACE   ADDENDA INFORMATION                             RETURN TRACE     DATE      
------------------------------------------------------------------------------------------------------------------------------------   
013555445 54332004551111      0000414504      24      3,105.54 044300012 0423344534        A454BKRET RETIRE           044300012       
                       KALAYYYN BILALLO                                                                RET REASON NOT DEFINED ON CA     
  44 R01-INSUFFICIENT FUNDS     25105040   044300014553544                                                   251050405445444           
                                 DDA                 SAV                USER              TOTALS                                    
         DR COUNT                  1                   0                   0                     1                                    
         DR AMOUNT          3,105.54                0.00                0.00              3,105.54                                     
         CR COUNT                  0                   0                   0                     0                                    
         CR AMOUNT              0.00                0.00                0.00                  0.00                                     
R3030-02/R1.0 A454   PEP+         R E T U R N   R E S O L U T I O N   R E P O R T (FOR RETIRES)           11-30-54 05:34 PAGE      3  
                                                                                                     FROM DATE/WINDOW  11-30-54 0530  
                                                                                                     TO   DATE/WINDOW  11-30-54 0530   
                                                                                                     PROCESS DATE      11-30-54        
------------------------------------------------------------------------------------------------------------------------------------  
                                  T O T A L   F O R   P O I N T   A454BKMTG                                                            
------------------------------------------------------------------------------------------------------------------------------------
   RETURN   ORIG ITEM    CUSTOMER ID/                                                         <=========== DISTRIBUTION ===========>   
    PAR        PAR     CUSTOMER NAME          TC        AMOUNT    T/R    BANK ACCOUNT NO      POINT     APPL    TYPE     T/R     BR    
------------------------------------------------------------------------------------------------------------------------------------   
 ADDENDA                        ORIGINAL                                                                                     EVENT     
  TYPE    RETURN REASON         RDFI T/R   ORIGINAL TRACE   ADDENDA INFORMATION                             RETURN TRACE     DATE      
------------------------------------------------------------------------------------------------------------------------------------   
                                 DDA                 SAV                USER              TOTALS                                      
         DR COUNT                  1                   0                   0                     1                                    
         DR AMOUNT          3,105.54                0.00                0.00              3,105.54                                     
         CR COUNT                  1                   0                   0                     1                                    
         CR AMOUNT              0.00                0.00                0.00                  0.00                                     
R3030-02/R1.0 A454   PEP+         R E T U R N   R E S O L U T I O N   R E P O R T (FOR RETIRES)           11-30-54 05:34 PAGE      4  
                                                                                                     FROM DATE/WINDOW  11-30-54 0530  
                                                                                                     TO   DATE/WINDOW  11-30-54 0530   
                                                                                                     PROCESS DATE      11-30-54        
------------------------------------------------------------------------------------------------------------------------------------  
COLL APPL INFO:            POINT: A454CC               APPL ID: AAAEQU01   TYPE:        SETT DATE: 334          PAR DATE: 54334        
BATCH HEADER INFO:  COMPANY NAME: CCA Equipment     COMPANY ID: 1352542554  SEC: CCD   ENTRY DESC: Focus2truc   EFF DATE: 541124       
------------------------------------------------------------------------------------------------------------------------------------  
   RETURN   ORIG ITEM    CUSTOMER ID/                                                         <=========== DISTRIBUTION ===========>   
    PAR        PAR     CUSTOMER NAME          TC        AMOUNT    T/R    BANK ACCOUNT NO      POINT     APPL    TYPE     T/R     BR    
------------------------------------------------------------------------------------------------------------------------------------   
 ADDENDA                        ORIGINAL                                                                                     EVENT     
  TYPE    RETURN REASON         RDFI T/R   ORIGINAL TRACE   ADDENDA INFORMATION                             RETURN TRACE     DATE      
------------------------------------------------------------------------------------------------------------------------------------   
001543045 54324004454545      Focus 4552 2110 24      1,550.44 044300012 443443342         A454BKRET RETIRE           044300012       
                       Focus 4552 2110                                                                *** RETURNED TOO LATE ***        
  44 R01-INSUFFICIENT FUNDS     05100001   044300014454545            INSUFFICIENT FUNDS                     021000024554404           
                                 DDA                 SAV                USER              TOTALS                                    
         DR COUNT                  1                   0                   0                     1                                    
         DR AMOUNT          1,550.44                0.00                0.00              1,550.44                                     
         CR COUNT                  0                   0                   0                     0                                    
         CR AMOUNT              0.00                0.00                0.00                  0.00                                     
R3140-02/R4.0 A454   PEP+    D I S T R I B U T I O N   E N T R I E S   L I S T   (ACH FORMAT)             11-30-54 05:44 PAGE 24,214  
                         POINT: A454DDA    APPL: ENTRIES    TYPE:         T/R: 044300012           FROM DATE/WINDOW  11-30-2054 0530  
                                                                                                   TO   DATE/WINDOW  11-30-2054 0530   
------------------------------------------------------------------------------------------------------------------------------------  
  REC   SERVICE                                            COMPANY     STD ENTRY  CO. ENTRY  CO. DESC  EFF ENTRY    ORIG      BATCH    
 TYPE  CLASS CODE    COMPANY NAME    DISCRETIONARY DATA      ID          CLASS      DESC        DATE      DATE      BANK     NUMBER    
------------------------------------------------------------------------------------------------------------------------------------   
   5      200       VCS Lendneo       14032540             14032540       CCD    VCS_SSTL     541124    541130    04401443  0025044   
------------------------------------------------------------------------------------------------------------------------------------  
  REC       TRAN/ABA                                       CUSTOMER         CUSTOMER              DISC ADDENDA                 TRACE   
 TYPE  TC    NUMBER      BANK ACCT NO.          AMOUNT        ID              NAME                DATA   IND       PAR        NUMBER   
------------------------------------------------------------------------------------------------------------------------------------   
   4   54  04430001-2    10002130               1,533.45  2144214         Client Services                 0  54333010544532  0544532  
------------------------------------------------------------------------------------------------------------------------------------  
  REC   SERVICE    ENTRY      ENTRY               TOTAL DR          TOTAL CR     COMPANY       ORIG      BATCH                         
 TYPE  CLASS CODE  COUNT      HASH               DOLLAR AMT        DOLLAR AMT      ID          BANK     NUMBER                         
------------------------------------------------------------------------------------------------------------------------------------   
   4      200     000001     0004430001                .00          1,533.45   14032540      04401443  0025044                        
R3140-02/R4.0 A454   PEP+    D I S T R I B U T I O N   E N T R I E S   L I S T   (ACH FORMAT)             11-30-54 05:44 PAGE 24,540  
                         POINT: A454DDA    APPL: ENTRIES    TYPE:         T/R: 044300012           FROM DATE/WINDOW  11-30-2054 0530  
                                                                                                   TO   DATE/WINDOW  11-30-2054 0530   
------------------------------------------------------------------------------------------------------------------------------------  
  REC   SERVICE                                            COMPANY     STD ENTRY  CO. ENTRY  CO. DESC  EFF ENTRY    ORIG      BATCH    
 TYPE  CLASS CODE    COMPANY NAME    DISCRETIONARY DATA      ID          CLASS      DESC        DATE      DATE      BANK     NUMBER    
------------------------------------------------------------------------------------------------------------------------------------   
   5      200       VVD FirstLean     14035424             14035424       CCD    VCS_SSTL     541124    541130    04401443  0025045   
------------------------------------------------------------------------------------------------------------------------------------  
  REC       TRAN/ABA                                       CUSTOMER         CUSTOMER              DISC ADDENDA                 TRACE   
 TYPE  TC    NUMBER      BANK ACCT NO.          AMOUNT        ID              NAME                DATA   IND       PAR        NUMBER   
------------------------------------------------------------------------------------------------------------------------------------   
   4   54  04430001-2    10002130                 124.54  2145455         Client Services                 0  54333010544534  0544534  
------------------------------------------------------------------------------------------------------------------------------------  
  REC   SERVICE    ENTRY      ENTRY               TOTAL DR          TOTAL CR     COMPANY       ORIG      BATCH                         
 TYPE  CLASS CODE  COUNT      HASH               DOLLAR AMT        DOLLAR AMT      ID          BANK     NUMBER                         
------------------------------------------------------------------------------------------------------------------------------------   
   4      200     000001     0004430001                .00            124.54   14035424      04401443  0025045                        
R3020-02/R4.0 A454   PEP+                C O L L E C T I O N   E N T R Y   R E P O R T                    11-30-54 05:35 PAGE  1,021  
                                                          DETAIL LIST                        COLLECTION DATE/WINDOW: 11-30-2054 0530  
------------------------------------------------------------------------------------------------------------------------------------  
  POINT: A454FED    APPL: 444444444           TYPE:                 PAR DATE: 54334         FILE/BATCH REF NO: 54334 000021/0000442    
              IMMED DEST:  443340504  FISERV                         CREATED: 11/24/54  54:52          FORMAT: 1                       
              IMMED ORIG:  441000134  ASF APPLICATION SUPERVI    FILE ID MOD: G                      REF CODE:                         
------------------------------------------------------------------------------------------------------------------------------------   
 CO NAME: Worldpay         ID: 1244535455 ENT DESC: WIC             ODFI T/R: 0115-0012                 BATCH: 0000442                 
                EFF DATE: 11/30/54   ORIG EFF DATE: 11/30/54       DESC DATE: 112454       DISCRETIONARY DATA: SUPERMERCADO EL GUER    
                     SEC: CCD  SRVC CLS: 200   PAR: 001534354      ORIG STAT: NACHA           SETTLEMENT DATE: 11/30/54 (334)          
------------------------------------------------------------------------------------------------------------------------------------   
                                                                                            *------- D I S T R I B U T I O N ------*   
PAR NO     TC        T/R    ACCOUNT                  AMOUNT INDIVIDUAL NAME/ID     DS TRACE   POINT     APPL      TYPE   T/R           
                                                                                                                    BRANCH DATE WIND   
------------------------------------------------------------------------------------------------------------------------------------   
001534354  54-CDN 044300012 4424135004                 5.14 SUPERMERCADO EL GUERO     3441334 A454DDA   ENTRIES          044300012    
                                                                 542424404540510                                          11/30 0530   
                                                                     3023-544 INFO FOR CPO REASON: CPO00 OEM FI SWAP                   
                                                                        3023-42 *** TRANSIT/ROUTING SWAPPED FROM - 051401404 ***       
         TYPE: 05  FREE FORM:    Worldpay        WIC      DEP 542424404540510  SUPERMERCADO EL GUERO    BC      ADD SEQ: 0001          
------------------------------------------------------------------------------------------------------------------------------------   
                                                  B A T C H   T O T A L S                                                              
           BATCH TRAILER:               HASH TOTAL:     5140140                                   DR AMOUNT:               .00         
                                                                                                  CR AMOUNT:              5.14         
             SERVICE CLASS CD: 200   ENT/ADD COUNT:           2                                  NET AMOUNT:              5.14         
------------------------------------------------------------------------------------------------------------------------------------   
   *- - - - - - V A L I D / W A R N I N G S - - - - - -*  *- - - - - - R E J E C T S - - - - - -*  *- - - - - - T O T A L - - - - -*   
                    COUNT            AMOUNT              COUNT  PCT              AMOUNT  PCT              COUNT            AMOUNT      
------------------------------------------------------------------------------------------------------------------------------------   
DDA CR                  1              5.14                                         .00                      1               5.14      
 *COLLECTED CR          1              5.14                                         .00                      1               5.14     
 *INPUT CR              1              5.14                                         .00                      1               5.14     
ADDENDA                 1               .00                                         .00                      1                .00     
WARNINGS                2              5.14                                         .00                      2               5.14        
text file is the actual raw data I want to import.
Excel file is the output I am looking for
pdf file is just a markup on where I get the values from.
And assuming these are fake files and not work files because I'm guessing some information in there shouldn't be public..
It is easier to share images by just pasting them into the forum instead of attachments. Many users cannot download attachments.
It would help if the image in your PDF did not wrap the lines of text. Perhaps if you changed the font size used or switched to landscape orientation you could make a picture without the line wrapping.
One thing of importance you did not highlight is the FormFeed character that appears at the start of each extra page. '0C'x is the ASCII code for the formfeed character.
Another thing to watch out for is what happens when there is more information than fits on one page? you might want to look at some of the larger example files and see if that happens. Does the report repeat all of the header information again? Does it include something like a (CONTINUED) line to indicate that it needs a second page?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
