How can I read raw data with varying lines in headers in a hierarchical program?
So Buyer code and Vendor will be the headers and need to read in all the other stuff but, if you note in 764809 there are 3 detail lines compared to only 1 detail lines for the others
Buyer Code: BLS Joe SM ITH
Vendor: 8444 DAT EX- OHMEDA
762693 62246385 / 03/06/15 42099 1075.19 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1075.19 Q
762694 62246384 / 03/06/15 42099 1115.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1115.38 Q
763572 62253066 / 03/18/15 42111 890.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 890.38 Q
764809 62260713 / 04/01/15 42125 8190.48 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 3971.88 Q
2 Las t Recvd Dat e: 12/19/14 2 7 7 1 3740 3740 Q
3 Las t Recvd Dat e: 12/19/14 3 7 7 1 478.59 478.6 Q
765293 62257450 / 03/26/15 42119 540 264568 P
1 Las t Recvd Dat e: 01/08/14 1 3 3 1 10750 10750 Q
Here is the code I working with ...
data myfile3;
length buyer_code vendor $200 ;
retain buyer_code vendor '';
infile '/home/nyioves/Nikfold/Invoice.txt' truncover obs=400 ;
input @@;
if _infile_ =: 'Buyer Code' then do;
buyer_code = cat(scan(_infile_,2,':'));
input;
end;
else if _infile_ =: 'Vendor:' then do;
vendor = cat(scan(_infile_,2,':'));
input;
end;
else if index(linenum,1) then do;
input @1 Vouchnum 6.
@7 Invno 9.
@16 seperat1 $1.
@18 Vndinvdate $9.
@27 Number1 6.
@33 InvAmt $8.
@41 Ponum $7.
@48 Status1 $25. /
@3 linenum $1.
@5 txt1 $20.
@24 Invdate $8.
@34 linenum2 1.
@36 recd 1.
@38 POdis 1.
@40 Invd 1.
@41 price1 $8.
@49 price2 $8.
@55 status2 $1.
;
x = _infile_;
output;
end;
end;
else do;
input; * MOVE INPUT CURSOR TO NEXT LINE WITHOUT ASSIGNING ANYTHING;
end;
run;
Your code looks good.
data myfile3;
infile cards truncover;
input @;
retain buyer_code vendor Vouchnum
Invno
seperat1
Vndinvdate
Number1
InvAmt
Ponum
Status1 ;
if _infile_ =: 'Buyer Code' then do;
buyer_code = cat(scan(_infile_,2,':'));
end;
else if _infile_ =: 'Vendor:' then do;
vendor = cat(scan(_infile_,2,':'));
end;
else if missing(_infile_) then delete;
else if prxmatch('/^\d{4,}/',_infile_) then do;
input (Vouchnum
Invno
seperat1
Vndinvdate
Number1
InvAmt
Ponum
Status1 ) ( : $20.) ;
end;
else if prxmatch('/^\s*\d\s/',_infile_) then do;
input ( linenum
txt1
Invdate
linenum2
recd
POdis
Invd
price1
price2
status2 ) (: $20.)
;
output;
end;
cards;
Buyer Code: BLS Joe SM ITH
Vendor: 8444 DAT EX- OHMEDA
762693 62246385 / 03/06/15 42099 1075.19 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1075.19 Q
762694 62246384 / 03/06/15 42099 1115.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1115.38 Q
763572 62253066 / 03/18/15 42111 890.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 890.38 Q
764809 62260713 / 04/01/15 42125 8190.48 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 3971.88 Q
2 Las t Recvd Dat e: 12/19/14 2 7 7 1 3740 3740 Q
3 Las t Recvd Dat e: 12/19/14 3 7 7 1 478.59 478.6 Q
765293 62257450 / 03/26/15 42119 540 264568 P
1 Las t Recvd Dat e: 01/08/14 1 3 3 1 10750 10750 Q
;
run;
Xia Keshan
thx !That works but I don't know what you did lol
I am really looking for the if/then statement after the index statement to help me process the detail lines
OK. Since you are not familiar with Perl Regular Expression, Try this one :
data myfile3;
infile cards truncover;
input @;
retain buyer_code vendor Vouchnum
Invno
seperat1
Vndinvdate
Number1
InvAmt
Ponum
Status1 ;
if _infile_ =: 'Buyer Code' then do;
buyer_code = cat(scan(_infile_,2,':'));
end;
else if _infile_ =: 'Vendor:' then do;
vendor = cat(scan(_infile_,2,':'));
end;
else if missing(_infile_) then delete;
else if length(scan(_infile_,1)) gt 2 then do;
input (Vouchnum
Invno
seperat1
Vndinvdate
Number1
InvAmt
Ponum
Status1 ) ( : $20.) ;
end;
else if length(scan(_infile_,1)) eq 1 then do;
input ( linenum
txt1
Invdate
linenum2
recd
POdis
Invd
price1
price2
status2 ) (: $20.)
;
output;
end;
cards;
Buyer Code: BLS Joe SM ITH
Vendor: 8444 DAT EX- OHMEDA
762693 62246385 / 03/06/15 42099 1075.19 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1075.19 Q
762694 62246384 / 03/06/15 42099 1115.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1115.38 Q
763572 62253066 / 03/18/15 42111 890.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 890.38 Q
764809 62260713 / 04/01/15 42125 8190.48 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 3971.88 Q
2 Las t Recvd Dat e: 12/19/14 2 7 7 1 3740 3740 Q
3 Las t Recvd Dat e: 12/19/14 3 7 7 1 478.59 478.6 Q
765293 62257450 / 03/26/15 42119 540 264568 P
1 Las t Recvd Dat e: 01/08/14 1 3 3 1 10750 10750 Q
;
run;
Xia Keshan
doesn't work. you really need 2 hierarchical programs. The inner one for that ID number as headers and it's various detail lines, and outer which has the Buyer and Vendor as headers
i am trying to do something like this ....
data myfile3 (drop=chkcol linechk);
infile '/home/nyioves/Nikfold/Invoice.txt' truncover obs=400 ;
length Buyer_Code Vendor $200;
input chkcol $ 5 @1 @;
if chkol='Vouch' then delete;
if chkol='Nbr I' then delete;
if chkol='Price' then delete;
if chkol='-----' then delete;
if chkol='Date:' then delete;
if chkol='Time:' then delete;
if chkol=' ' then delete;
if chkcol="Buyer" then
input @12 Buyer_Code $50 @
;
retain Buyer_Code;
else if chkcol="Vendo" then
input @9 Vendor $50 @
;
retain Vendor;
else do;
input @1 Vouchnum 6.
@7 Invno 9.
@16 seperat1 $1.
@18 Vndinvdate $9.
@27 Number1 6.
@33 InvAmt $8.
@41 Ponum $7.
@48 Status1 $25. @
;
retain Vouchnum Invno seperat1 Vndinvdate Number1 InvAmt Ponum Status1;
input @3 linenum $1.
@5 txt1 $20.
@24 Invdate $8.
@34 linenum2 1.
@36 recd 1.
@38 POdis 1.
@40 Invd 1.
@41 price1 $8.
@49 price2 $8.
@55 status2 $1. @
;
retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;
input linechk $1. @3 @;
if linechk = '2' then
input @3 linenum $1.
@5 txt1 $20.
@24 Invdate $8.
@34 linenum2 1.
@36 recd 1.
@38 POdis 1.
@40 Invd 1.
@41 price1 $8.
@49 price2 $8.
@55 status2 $1. @
;
retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;
else if linechk = '3' then
input @3 linenum $1.
@5 txt1 $20.
@24 Invdate $8.
@34 linenum2 1.
@36 recd 1.
@38 POdis 1.
@40 Invd 1.
@41 price1 $8.
@49 price2 $8.
@55 status2 $1.
;
retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;
end;
run;
proc print data=myfile3;
run;
i tried to break it down by just the 'trouble areas ' but i can this is just not possible
here' s the data and there are thousands of this throughout....
oucher Vendor Vendor Payment Po Nb Total Total PO Inv
Nbr I nv Nbr/Ln Inv Dat e Due Date Inv Amt Ln Nb Rec'd Dist Inv'd Price Price St
------------ ---------- --- ----------- ----------- -------------- -------- - --------------- -------- --------- -------------- ------------ ---
Buyer Code: BLS BETTIE SM ITH
Vendor: 8444 DAT EX- OHMEDA
762100 62245094 / 03/04/15 42097 2142.46 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 2142.46 Q
762101 62243752 / 03/01/15 42094 16001.6 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 16001.6 B
762690 62247150 / 03/07/15 42100 1790.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1790.38 Q
762691 62246387 / 03/06/15 42099 1520.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1520.38 Q
762692 62246386 / 03/06/15 42099 1070.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1070.38 Q
762693 62246385 / 03/06/15 42099 1075.19 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1075.19 Q
762694 62246384 / 03/06/15 42099 1115.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1115.38 Q
763572 62253066 / 03/18/15 42111 890.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 890.38 Q
763574 62253064 / 03/18/15 42111 540 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 540 Q
763805 62255278 / 03/21/15 42114 1520.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1520.38 Q
764809 62260713 / 04/01/15 42125 8190.48 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 3971.88 Q
2 Las t Recvd Dat e: 12/19/14 2 7 7 1 3740 3740 Q
3 Las t Recvd Dat e: 12/19/14 3 7 7 1 478.59 478.6 Q
765293 62257450 / 03/26/15 42119 540 264568 P
1 Las t Recvd Dat e: 6 1 574.31 540 Q
765294 62257449 / 03/26/15 42119 720 264568 P
1 Las t Recvd Dat e: 6 1 574.31 720 B
Date: 05/01/ 2015 ST. B ARNABAS O SPITAL P age : 2
Time: 8:47am Pay ables Status E xception R pt For 05/01/20 15 R eport: GARPS ESB
Voucher Vendor Vendor Payment Po Nb Total Total PO Inv
Nbr I nv Nbr/Ln Inv Dat e Due Date Inv Amt Ln Nb Rec'd Dist Inv'd Price Price St
------------ ---------- --- ----------- ----------- -------------- -------- -
here is the code i am working with ... i always get data errors when the detail line numbers change from one line to 3 lines in 764809.
i can't seem to hold the line and grab those guys in a typical hierarchy program...
ata myfile3 (drop=chkcol linechk);
infile '/home/nyioves/Nikfold/Invoice.txt' truncover obs=400 ;
length Buyer_Code Vendor $200;
input chkcol $ 5 @1 @;
if chkol='Vouch' then delete;
if chkol='Nbr I' then delete;
if chkol='Price' then delete;
if chkol='-----' then delete;
if chkol='Date:' then delete;
if chkol='Time:' then delete;
if chkol=' ' then delete;
if chkcol="Buyer" then
input @12 Buyer_Code $50 @
;
retain Buyer_Code;
else if chkcol="Vendo" then
input @9 Vendor $50 @
;
retain Vendor;
else do;
input @1 Vouchnum 6.
@7 Invno 9.
@16 seperat1 $1.
@18 Vndinvdate $9.
@27 Number1 6.
@33 InvAmt $8.
@41 Ponum $7.
@48 Status1 $25. @
;
retain Vouchnum Invno seperat1 Vndinvdate Number1 InvAmt Ponum Status1;
input @3 linenum $1.
@5 txt1 $20.
@24 Invdate $8.
@34 linenum2 1.
@36 recd 1.
@38 POdis 1.
@40 Invd 1.
@41 price1 $8.
@49 price2 $8.
@55 status2 $1. @
;
retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;
input linechk $1. @3 @;
if linechk = '2' then
input @3 linenum $1.
@5 txt1 $20.
@24 Invdate $8.
@34 linenum2 1.
@36 recd 1.
@38 POdis 1.
@40 Invd 1.
@41 price1 $8.
@49 price2 $8.
@55 status2 $1. @
;
retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;
else if linechk = '3' then
input @3 linenum $1.
@5 txt1 $20.
@24 Invdate $8.
@34 linenum2 1.
@36 recd 1.
@38 POdis 1.
@40 Invd 1.
@41 price1 $8.
@49 price2 $8.
@55 status2 $1.
;
retain linenum txt1 Invdate linenum2 recd POdis Invd price1 price2 status2;
end;
run;
proc print data=myfile3;
run;
Hi:
Not much help,. but some comments.
RETAIN is a compile time statement. It looks like you think that the retain will be different for different INPUT statements, but it won't. Imagine that SAS is going through your program at COMPILE time and handling all the RETAIN, LENGTH and FORMAT statements. It's like the RETAIN statements are NOT there at execution time.
Next, I notice in the above code that sometimes you have chkcol and then other times you have chkol -- those are 2 different variable names -- this kind of thing will mess up your program logic.
It almost looks like an old mainframe (TEXT) report that you're trying to scrape (with the dashes under the column headers). Things looks sort of weird with the data, too, almost like it's been edited or compressed. What's odd are the stray spaces in some of the lines/fields like:
BETTIE SM ITH (space between the M and the I)
Las t Recvd Dat e: (space between the s and the t and the t and the e)
E xception: (space between the E and the x)
B ARNABAS O SPITAL (missing H in HOSPITAL and space between O and S and space between B and A)
To use column input (where you have @49, @55, etc), generally means that the columns "line up" -- which seems to be true for some but not all of the the example data you've posted. Is this the way the file is being sent to you? Or did it get messed up when you pasted it into the Forum window?
cynthia
@ cynthia
it is data from an old (very old!) mckesson but yes the mainframe does the same thing. Apparently someone i work with used a delimiter software, which i don't know what it is, to get it like that.
i guess it more or less does what text to columns / delimiter does in excel
they apparently think they did something -roll eyes-
There are some data that I never saw .
data myfile3;
infile cards truncover;
input @;
retain buyer_code vendor Vouchnum
Invno
seperat1
Vndinvdate
Number1
InvAmt
Ponum
Status1 ;
if _infile_ =: 'Buyer Code' then do;
buyer_code = cat(scan(_infile_,2,':'));
end;
else if _infile_ =: 'Vendor:' then do;
vendor = cat(scan(_infile_,2,':'));
end;
else if missing(_infile_) then delete;
else if length(scan(_infile_,1)) gt 2 and not notdigit(scan(_infile_,1)) then do;
input (Vouchnum
Invno
seperat1
Vndinvdate
Number1
InvAmt
Ponum
Status1 ) ( : $20.) ;
end;
else if length(scan(_infile_,1)) lt 3 and not notdigit(scan(_infile_,1)) then do;
input ( linenum
txt1
Invdate
linenum2
recd
POdis
Invd
price1
price2
status2 ) (: $20.)
;
output;
end;
cards;
oucher Vendor Vendor Payment Po Nb Total Total PO Inv
Nbr I nv Nbr/Ln Inv Dat e Due Date Inv Amt Ln Nb Rec'd Dist Inv'd Price Price St
------------ ---------- --- ----------- ----------- -------------- -------- - --------------- -------- --------- -------------- ------------ ---
Buyer Code: BLS BETTIE SM ITH
Vendor: 8444 DAT EX- OHMEDA
762100 62245094 / 03/04/15 42097 2142.46 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 2142.46 Q
762101 62243752 / 03/01/15 42094 16001.6 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 16001.6 B
762690 62247150 / 03/07/15 42100 1790.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1790.38 Q
762691 62246387 / 03/06/15 42099 1520.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1520.38 Q
762692 62246386 / 03/06/15 42099 1070.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1070.38 Q
762693 62246385 / 03/06/15 42099 1075.19 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1075.19 Q
762694 62246384 / 03/06/15 42099 1115.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1115.38 Q
763572 62253066 / 03/18/15 42111 890.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 890.38 Q
763574 62253064 / 03/18/15 42111 540 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 540 Q
763805 62255278 / 03/21/15 42114 1520.38 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 1520.38 Q
764809 62260713 / 04/01/15 42125 8190.48 264568 P
1 Las t Recvd Dat e: 12/19/14 1 7 7 1 3971.88 3971.88 Q
2 Las t Recvd Dat e: 12/19/14 2 7 7 1 3740 3740 Q
3 Las t Recvd Dat e: 12/19/14 3 7 7 1 478.59 478.6 Q
765293 62257450 / 03/26/15 42119 540 264568 P
1 Las t Recvd Dat e: 6 1 574.31 540 Q
765294 62257449 / 03/26/15 42119 720 264568 P
1 Las t Recvd Dat e: 6 1 574.31 720 B
Date: 05/01/ 2015 ST. B ARNABAS O SPITAL P age : 2
Time: 8:47am Pay ables Status E xception R pt For 05/01/20 15 R eport: GARPS ESB
Voucher Vendor Vendor Payment Po Nb Total Total PO Inv
Nbr I nv Nbr/Ln Inv Dat e Due Date Inv Amt Ln Nb Rec'd Dist Inv'd Price Price St
------------ ---------- --- ----------- ----------- -------------- -------- -
;
run;
Xia Keshan
that is really great Xia! i know it works with the cards statement however when i sub in my text file in the infile statement I get no observations in the output data set.
I can't figure out how to do it to get it to repeat. i tried ... else do; input; end; at the end of the program so but no avail. i figured it finds something that isn't the data we want it will delete but then move to the next observation
It is hard to give you some advice . Since there is not a sample file to test .
One thing I can image is adding an other option .
infile cards truncover expandtabs ;
Message was edited by: xia keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.