BookmarkSubscribeRSS Feed
Nyioves
Calcite | Level 5

How can I read raw data with varying lines in headers in a hierarchical program?

So Buyer code and Vendor will be the headers and need to read in all the other stuff but, if you note in 764809 there are 3 detail lines compared to only 1 detail lines for the others


Buyer Code: BLS Joe SM ITH        

Vendor: 8444 DAT EX- OHMEDA

762693           62246385      /      03/06/15      42099      1075.19      264568       P

  1 Las t Recvd Dat e:      12/19/14  1 7 7 1      3971.88      1075.19      Q        

762694           62246384      /      03/06/15      42099      1115.38      264568       P

  1 Las t Recvd Dat e:      12/19/14  1 7 7 1      3971.88 1115.38      Q

763572           62253066      /      03/18/15      42111      890.38      264568       P   

  1 Las t Recvd Dat e: 12/19/14  1 7 7 1 3971.88 890.38 Q

764809 62260713 / 04/01/15 42125 8190.48 264568       P

  1 Las t Recvd Dat e: 12/19/14  1 7 7 1 3971.88 3971.88 Q

  2 Las t Recvd Dat e: 12/19/14  2 7 7 1 3740 3740 Q

  3 Las t Recvd Dat e: 12/19/14  3 7 7 1 478.59 478.6 Q

765293      62257450      /           03/26/15      42119      540           264568       P

1 Las t Recvd Dat e: 01/08/14  1 3 3 1 10750 10750 Q

Here is the code I working with ...

data myfile3;
  length buyer_code vendor $200  ;
  retain buyer_code vendor '';

  infile '/home/nyioves/Nikfold/Invoice.txt' truncover obs=400 ;
  input @@;

              if _infile_ =: 'Buyer Code' then do;
                buyer_code = cat(scan(_infile_,2,':'));
                input;
              end;
              else if _infile_ =: 'Vendor:' then do;
                     vendor = cat(scan(_infile_,2,':'));
                     input;
              end;
             else if index(linenum,1) then do;
                input @1  Vouchnum  6. 
                      @7  Invno   9.
                      @16 seperat1  $1.
                      @18 Vndinvdate  $9.
                      @27 Number1  6.
                      @33 InvAmt   $8.
                    @41 Ponum    $7.
                    @48 Status1  $25.  /
                    @3  linenum  $1.
                    @5  txt1            $20.
                    @24 Invdate         $8.
                    @34 linenum2        1.
                    @36 recd            1.
                    @38 POdis  1.
                    @40 Invd  1.
                    @41 price1          $8.
                    @49 price2          $8. 
                    @55 status2         $1.
                         ;
                     x = _infile_;    

                 output;
              end;
   
  end;
  else do;
    input; * MOVE INPUT CURSOR TO NEXT LINE WITHOUT ASSIGNING ANYTHING;
  end;
run;

13 REPLIES 13
Ksharp
Super User

Your code looks good.

data myfile3;

infile cards truncover;

  input @;

  retain  buyer_code vendor Vouchnum   

                        Invno  

                      seperat1

                       Vndinvdate 

                       Number1 

                       InvAmt 

                     Ponum   

                    Status1 ;

  if _infile_ =: 'Buyer Code' then do;

                buyer_code = cat(scan(_infile_,2,':'));

              end;

              else if _infile_ =: 'Vendor:' then do;

                     vendor = cat(scan(_infile_,2,':'));

              end;

   else if missing(_infile_) then delete;

             else if prxmatch('/^\d{4,}/',_infile_) then do;

                input (Vouchnum   

                        Invno  

                      seperat1

                       Vndinvdate 

                       Number1 

                       InvAmt 

                     Ponum   

                    Status1 ) ( : $20.)   ;

  end;

  else if prxmatch('/^\s*\d\s/',_infile_) then do;

              input     ( linenum 

                      txt1           

                     Invdate       

                     linenum2      

                     recd         

                     POdis 

                     Invd 

                     price1         

                     price2          

                     status2 ) (: $20.)      

                         ;   

                 output;

              end;  

cards;

Buyer Code: BLS Joe SM ITH       

Vendor: 8444 DAT EX- OHMEDA

762693           62246385      /      03/06/15      42099      1075.19      264568       P

  1 Las t Recvd Dat e:      12/19/14  1 7 7 1      3971.88      1075.19      Q       

762694           62246384      /      03/06/15      42099      1115.38      264568       P

  1 Las t Recvd Dat e:      12/19/14  1 7 7 1      3971.88 1115.38      Q

763572           62253066      /      03/18/15      42111      890.38      264568       P  

  1 Las t Recvd Dat e: 12/19/14  1 7 7 1 3971.88 890.38 Q

764809 62260713 / 04/01/15 42125 8190.48 264568       P

  1 Las t Recvd Dat e: 12/19/14  1 7 7 1 3971.88 3971.88 Q

  2 Las t Recvd Dat e: 12/19/14  2 7 7 1 3740 3740 Q

  3 Las t Recvd Dat e: 12/19/14  3 7 7 1 478.59 478.6 Q

765293      62257450      /           03/26/15      42119      540           264568       P

1 Las t Recvd Dat e: 01/08/14  1 3 3 1 10750 10750 Q

;

run;

Xia Keshan

Nyioves
Calcite | Level 5

thx !That works but I don't know what you did lol

Nyioves
Calcite | Level 5

I am really looking for the if/then statement after the index statement to help me process the detail lines

Ksharp
Super User

OK. Since you are not familiar with Perl Regular Expression, Try this one :

data myfile3;

infile cards truncover;

  input @;

  retain  buyer_code vendor Vouchnum  

                        Invno 

                      seperat1

                       Vndinvdate

                       Number1

                       InvAmt

                     Ponum  

                    Status1 ;

  if _infile_ =: 'Buyer Code' then do;

                buyer_code = cat(scan(_infile_,2,':'));

              end;

              else if _infile_ =: 'Vendor:' then do;

                     vendor = cat(scan(_infile_,2,':'));

              end;

   else if missing(_infile_) then delete;

             else if length(scan(_infile_,1)) gt 2 then do;

                input (Vouchnum  

                        Invno 

                      seperat1

                       Vndinvdate

                       Number1

                       InvAmt

                     Ponum  

                    Status1 ) ( : $20.)   ;

  end;

  else if length(scan(_infile_,1)) eq 1 then do;

              input     ( linenum

                      txt1          

                     Invdate      

                     linenum2     

                     recd        

                     POdis

                     Invd

                     price1        

                     price2         

                     status2 ) (: $20.)     

                         ;  

                 output;

              end; 

cards;

Buyer Code: BLS Joe SM ITH      

Vendor: 8444 DAT EX- OHMEDA

762693           62246385      /      03/06/15      42099      1075.19      264568       P

  1 Las t Recvd Dat e:      12/19/14  1 7 7 1      3971.88      1075.19      Q      

762694           62246384      /      03/06/15      42099      1115.38      264568       P

  1 Las t Recvd Dat e:      12/19/14  1 7 7 1      3971.88 1115.38      Q

763572           62253066      /      03/18/15      42111      890.38      264568       P 

  1 Las t Recvd Dat e: 12/19/14  1 7 7 1 3971.88 890.38 Q

764809 62260713 / 04/01/15 42125 8190.48 264568       P

  1 Las t Recvd Dat e: 12/19/14  1 7 7 1 3971.88 3971.88 Q

  2 Las t Recvd Dat e: 12/19/14  2 7 7 1 3740 3740 Q

  3 Las t Recvd Dat e: 12/19/14  3 7 7 1 478.59 478.6 Q

765293      62257450      /           03/26/15      42119      540           264568       P

1 Las t Recvd Dat e: 01/08/14  1 3 3 1 10750 10750 Q

;

run;

Xia Keshan

Nyioves
Calcite | Level 5

doesn't work. you really need 2 hierarchical programs. The inner one for that ID number as headers and it's various detail lines, and outer which has the Buyer and Vendor as headers

Nyioves
Calcite | Level 5

i am trying to do something like this ....

data myfile3 (drop=chkcol linechk);

  infile '/home/nyioves/Nikfold/Invoice.txt' truncover obs=400 ;

  length Buyer_Code Vendor $200;

    input chkcol $ 5 @1 @;

    if chkol='Vouch' then delete;

    if chkol='Nbr I' then delete;

    if chkol='Price' then delete;

    if chkol='-----' then delete;

    if chkol='Date:' then delete;

    if chkol='Time:' then delete;

    if chkol='     ' then delete;

    if chkcol="Buyer" then

    input @12   Buyer_Code  $50 @

  ;

    retain Buyer_Code;

    else if chkcol="Vendo" then

    input @9 Vendor $50     @

  ;

  retain Vendor;

  else do;

    input @1 Vouchnum 6. 

              @7 Invno 9.

              @16 seperat1 $1.

              @18   Vndinvdate $9.

              @27   Number1 6.

              @33   InvAmt $8.

           @41   Ponum   $7.

           @48   Status1 $25. @

  ; 

  retain Vouchnum Invno seperat1 Vndinvdate Number1 InvAmt Ponum Status1;

  input @3   linenum $1.

           @5   txt1        $20.

           @24 Invdate     $8.

           @34 linenum2    1.

           @36 recd        1.

           @38 POdis 1.

           @40 Invd 1.

           @41 price1      $8.

           @49 price2      $8. 

           @55 status2     $1.    @

            ;

  retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;

  input linechk $1. @3 @;

  if linechk = '2' then

     input @3   linenum $1.

           @5   txt1        $20.

           @24 Invdate     $8.

           @34 linenum2    1.

           @36 recd        1.

           @38 POdis 1.

           @40 Invd 1.

           @41 price1      $8.

           @49 price2      $8. 

           @55 status2     $1.    @

            ;

  retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;

  else if linechk = '3' then

  input @3   linenum $1.

           @5   txt1        $20.

           @24 Invdate     $8.

           @34 linenum2    1.

           @36 recd        1.

           @38 POdis 1.

           @40 Invd 1.

           @41 price1      $8.

           @49 price2      $8. 

           @55 status2     $1.   

            ;

     retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;

  end;

run;

proc print data=myfile3;

run;

Nyioves
Calcite | Level 5

i tried to break it down by just the 'trouble areas ' but i can this is just not possible

here' s the data and there are thousands of this throughout....

oucher Vendor Vendor Payment Po Nb Total Total PO Inv
Nbr I nv Nbr/Ln Inv Dat e Due Date Inv Amt Ln Nb Rec'd Dist Inv'd Price Price St
------------ ---------- --- ----------- ----------- -------------- -------- - --------------- -------- --------- -------------- ------------ ---
Buyer Code: BLS BETTIE SM ITH 
Vendor: 8444 DAT EX- OHMEDA 
762100 62245094 / 03/04/15 42097 2142.46 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 2142.46 Q
762101 62243752 / 03/01/15 42094 16001.6 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 16001.6 B
762690 62247150 / 03/07/15 42100 1790.38 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1790.38 Q
762691 62246387 / 03/06/15 42099 1520.38 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1520.38 Q
762692 62246386 / 03/06/15 42099 1070.38 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1070.38 Q
762693 62246385 / 03/06/15 42099 1075.19 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1075.19 Q
762694 62246384 / 03/06/15 42099 1115.38 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1115.38 Q
763572 62253066 / 03/18/15 42111 890.38 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 890.38 Q
763574 62253064 / 03/18/15 42111 540 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 540 Q
763805 62255278 / 03/21/15 42114 1520.38 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1520.38 Q
764809 62260713 / 04/01/15 42125 8190.48 264568 P
  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 3971.88 Q
  2 Las t Recvd Dat e: 12/19/14 2 7 7 1 3740 3740 Q
  3 Las t Recvd Dat e: 12/19/14 3 7 7 1 478.59 478.6 Q
765293 62257450 / 03/26/15 42119 540 264568 P
  1 Las t Recvd Dat e: 6 1 574.31 540 Q
765294 62257449 / 03/26/15 42119 720 264568 P
  1 Las t Recvd Dat e: 6 1 574.31 720 B
Date: 05/01/ 2015 ST. B ARNABAS O SPITAL P age : 2
Time: 8:47am Pay ables Status E xception R pt For 05/01/20 15 R eport: GARPS ESB
Voucher Vendor Vendor Payment Po Nb Total Total PO Inv
Nbr I nv Nbr/Ln Inv Dat e Due Date Inv Amt Ln Nb Rec'd Dist Inv'd Price Price St
------------ ---------- --- ----------- ----------- -------------- -------- - 

Nyioves
Calcite | Level 5

here is the code i am working with ... i always get data errors when the detail line numbers change from one line to 3 lines in 764809.

i can't seem to hold the line and grab those guys in a typical hierarchy program...

ata myfile3 (drop=chkcol linechk);

  infile '/home/nyioves/Nikfold/Invoice.txt' truncover obs=400 ;

  length Buyer_Code Vendor $200;

    input chkcol $ 5 @1 @;

    if chkol='Vouch' then delete;

    if chkol='Nbr I' then delete;

    if chkol='Price' then delete;

    if chkol='-----' then delete;

    if chkol='Date:' then delete;

    if chkol='Time:' then delete;

    if chkol='     ' then delete;

    if chkcol="Buyer" then

    input @12   Buyer_Code  $50 @

  ;

    retain Buyer_Code;

    else if chkcol="Vendo" then

    input @9 Vendor $50     @

  ;

  retain Vendor;

  else do;

    input @1 Vouchnum 6. 

              @7 Invno 9.

              @16 seperat1 $1.

              @18   Vndinvdate $9.

              @27   Number1 6.

              @33   InvAmt $8.

           @41   Ponum   $7.

           @48   Status1 $25. @

  ; 

  retain Vouchnum Invno seperat1 Vndinvdate Number1 InvAmt Ponum Status1;

  input @3   linenum $1.

           @5   txt1        $20.

           @24 Invdate     $8.

           @34 linenum2    1.

           @36 recd        1.

           @38 POdis 1.

           @40 Invd 1.

           @41 price1      $8.

           @49 price2      $8. 

           @55 status2     $1.    @

            ;

  retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;

  input linechk $1. @3 @;

  if linechk = '2' then

     input @3   linenum $1.

           @5   txt1        $20.

           @24 Invdate     $8.

           @34 linenum2    1.

           @36 recd        1.

           @38 POdis 1.

           @40 Invd 1.

           @41 price1      $8.

           @49 price2      $8. 

           @55 status2     $1.    @

            ;

  retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;

  else if linechk = '3' then

  input @3   linenum $1.

           @5   txt1        $20.

           @24 Invdate     $8.

           @34 linenum2    1.

           @36 recd        1.

           @38 POdis 1.

           @40 Invd 1.

           @41 price1      $8.

           @49 price2      $8. 

           @55 status2     $1.   

            ;

     retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;

  end;

run;

proc print data=myfile3;

run;

Cynthia_sas
SAS Super FREQ

Hi:

  Not much help,. but some comments.

  RETAIN is a compile time statement. It looks like you think that the retain will be different for different INPUT statements, but it won't. Imagine that SAS is going through your program at COMPILE time and handling all the RETAIN, LENGTH and FORMAT statements. It's like the RETAIN statements are NOT there at execution time.

  Next, I notice in the above code that sometimes you have chkcol and then other times you have chkol -- those are 2 different variable names -- this kind of thing will mess up your program logic.

  It almost looks like an old mainframe (TEXT) report that you're trying to scrape (with the dashes under the column headers). Things looks sort of weird with the data, too, almost like it's been edited or compressed. What's odd are the stray spaces in some of the lines/fields like:

BETTIE SM ITH  (space between the M and the I)

Las t Recvd Dat e:  (space between the s and the t and the t and the e)

E xception:  (space between the E and the x)

B ARNABAS O SPITAL (missing H in HOSPITAL and space between O and S and space between B and A)

  To use column input (where you have @49, @55, etc), generally means that the columns "line up" -- which seems to be true for some but not all of the the example data you've posted. Is this the way the file is being sent to you? Or did it get messed up when you pasted it into the Forum window?

cynthia

Nyioves
Calcite | Level 5

@ cynthia

it is data from an old (very old!) mckesson but yes the mainframe does the same thing. Apparently someone i work with used a delimiter software, which i don't know what it is, to get it like that.

i guess it more or less does what text to columns / delimiter does in excel

they apparently think they did something -roll eyes-

Ksharp
Super User

There are some data that I never saw .

data myfile3;

infile cards truncover;

  input @;

  retain  buyer_code vendor Vouchnum 

                        Invno

                      seperat1

                       Vndinvdate

                       Number1

                       InvAmt

                     Ponum 

                    Status1 ;

  if _infile_ =: 'Buyer Code' then do;

                buyer_code = cat(scan(_infile_,2,':'));

              end;

              else if _infile_ =: 'Vendor:' then do;

                     vendor = cat(scan(_infile_,2,':'));

              end;

   else if missing(_infile_) then delete;

             else if length(scan(_infile_,1)) gt 2 and not notdigit(scan(_infile_,1)) then do;

                input (Vouchnum 

                        Invno

                      seperat1

                       Vndinvdate

                       Number1

                       InvAmt

                     Ponum 

                    Status1 ) ( : $20.)   ;

  end;

  else if length(scan(_infile_,1)) lt 3 and not notdigit(scan(_infile_,1)) then do;

              input     ( linenum

                      txt1         

                     Invdate     

                     linenum2    

                     recd       

                     POdis

                     Invd

                     price1       

                     price2        

                     status2 ) (: $20.)    

                         ; 

                 output;

              end;

cards;

oucher Vendor Vendor Payment Po Nb Total Total PO Inv

Nbr I nv Nbr/Ln Inv Dat e Due Date Inv Amt Ln Nb Rec'd Dist Inv'd Price Price St

------------ ---------- --- ----------- ----------- -------------- -------- - --------------- -------- --------- -------------- ------------ ---

Buyer Code: BLS BETTIE SM ITH 

Vendor: 8444 DAT EX- OHMEDA 

762100 62245094 / 03/04/15 42097 2142.46 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 2142.46 Q

762101 62243752 / 03/01/15 42094 16001.6 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 16001.6 B

762690 62247150 / 03/07/15 42100 1790.38 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1790.38 Q

762691 62246387 / 03/06/15 42099 1520.38 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1520.38 Q

762692 62246386 / 03/06/15 42099 1070.38 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1070.38 Q

762693 62246385 / 03/06/15 42099 1075.19 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1075.19 Q

762694 62246384 / 03/06/15 42099 1115.38 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1115.38 Q

763572 62253066 / 03/18/15 42111 890.38 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 890.38 Q

763574 62253064 / 03/18/15 42111 540 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 540 Q

763805 62255278 / 03/21/15 42114 1520.38 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1520.38 Q

764809 62260713 / 04/01/15 42125 8190.48 264568 P

  1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 3971.88 Q

  2 Las t Recvd Dat e: 12/19/14 2 7 7 1 3740 3740 Q

  3 Las t Recvd Dat e: 12/19/14 3 7 7 1 478.59 478.6 Q

765293 62257450 / 03/26/15 42119 540 264568 P

  1 Las t Recvd Dat e: 6 1 574.31 540 Q

765294 62257449 / 03/26/15 42119 720 264568 P

  1 Las t Recvd Dat e: 6 1 574.31 720 B

Date: 05/01/ 2015 ST. B ARNABAS O SPITAL P age : 2

Time: 8:47am Pay ables Status E xception R pt For 05/01/20 15 R eport: GARPS ESB

Voucher Vendor Vendor Payment Po Nb Total Total PO Inv

Nbr I nv Nbr/Ln Inv Dat e Due Date Inv Amt Ln Nb Rec'd Dist Inv'd Price Price St

------------ ---------- --- ----------- ----------- -------------- -------- -

;

run;

Xia Keshan

Nyioves
Calcite | Level 5

that is really great Xia! i know it works with the cards statement however when i sub in my text file in the infile statement I get no observations in the output data set.

I can't figure out how to do it to get it to repeat. i tried ... else do; input; end; at the end of the program so but no avail. i figured it finds something that isn't the data we want it will delete but then move to the next observation

Ksharp
Super User

It is hard to give you some advice . Since there is not a sample file to test .

One thing I can image is adding an other option .

infile cards truncover expandtabs ;

Message was edited by: xia keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1504 views
  • 2 likes
  • 3 in conversation