Hello Again SAS Friends,
This is a followup to an earlier question.
Need to import a .TXT file that has a header that appears irregularly. The header appears at the top of the first page, and then reappears after 87, or 88, 89, 90, or more rows. Sometimes the header is inserted into the data subunits (see below), and splits them into two.
The basic data subunit of interest is either a 6 or 7 rows. The first component of the subunit is a 12 digit number #####-####-##, followed by other variables that are connected to that number.
The code to capture the data from the subunits is below. What is not included is code to work around the unwanted header information.
Attached is a sample of the .TXT file with the header inserted into inconvenient places to demonstrate the TXT file structure. The actual dataset is ~17000 pages, so is not attached.
One option tried was to import the entire .TXT file so that there is a single variable and one row for each row of the TXT file. However I could not get all the text (characters and spaces) from one row into a single variable field. If that is possible, then I think SAS character editing can be used to take care of the rest of the import.
Or if its possible to somehow remove the header information , that would work too. Should I parse this in another program?
////////////////////////////////////////////////////
DATA Cost;
infile datalines truncover;
INPUT @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;
if missing(PKG) then
input
@60 PKG 4.0
/ @1 NN_3 $1.
/ @1 NN_4 $1.
/ @29 AWP DOLLAR8.2
/ @1 NN_6 $1.
/ @1 NN_7 $12;
else
input
@1 NN_2 $1.
/ @1 NN_3 $1.
/ @1 NN_4 $1.
/ @29 AWP DOLLAR8.2
/ @1 NN_6 $1.
;
FORMAT AWP Dollar8.2 ;
datalines; 20000-0000-00 Product6_________________ 1 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture
AWP $0.00 Retail $0.00 Cost $0.00 Price Schedule Drug Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
100000-0000-00 Product7_________________ N
10 0 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture BAXA CORP.
AWP $158.04 Retail $0.00 Cost $0.00 Price Schedule 0 Drug Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
;
Thanks
R
The key is to check a row before anyinput.
Like this?
/**********************************************************************************************************/
data COST;
infile datalines truncover;
link skiprows; input @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;
if PKG =. then do;
link skiprows; input @60 PKG 4.0;
end;
link skiprows; input @29 TWP dollar8.2 ;
link skiprows; input @29 MAC dollar8.2 ;
return;
skiprows:
do until(^SKIPROW);
input @;
if left(_infile_) in: ( ''
'Min'
'0 '
'Rpt# '
'9/6/2018'
'/From'
'NDC'
'Pkg'
'OnHan'
'Supplie'
'QtySold'
'DE PTD'
'Equiva') then do; input; SKIPROW=1; end;
else SKIPROW=0;
end;
datalines;
Rpt# 100 Purchasing Summary Report Page 1
9/6/2018 Tops Inventory Report 8:36:07 PM
/From SEA: 0 to SEA: 2/Sort By Tops Name ,at location: Main
NDC Tops Type
Pkg
OnHan
Supplie
QtySold Order Code
DE PTD Order Qty
Equiva
20000-0000-00 Product6_________________ 1 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture
TWP $0.00 Retail $0.00 Cost $0.00 Price Schedule tops Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
100000-0000-00 Product7_________________ N
10 0 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture BAXA CORP.
TWP $158.04 Retail $0.00 Cost $0.00 Price Schedule 0 tops Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
20000-0000-00 Product8_________________ 1 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture
TWP $0.00 Retail $0.00 Cost $0.00 Price Schedule tops Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
100000-0000-00 Product9_________________ N
10 0 0 0 0 0
Min Onhand 0 Max Onhand
Rpt# 100 PurchasingSummary Report Page 1
9/6/2018 Tops Inventory Report 8:36:07 PM
/From SEA: 0 to SEA: 2/Sort By Tops Name ,at location: Main
NDC Tops Type
Pkg
OnHan
Supplie
QtySold Order Code
DE PTD Order Qty
Equiva
0 Lot No Last Sold Manufacture BAXA CORP.
TWP $158.04 Retail $0.00 Cost $0.00 Price Schedule 0 tops Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
run;
ABC | DRUG | PKG | TWP | MAC |
---|---|---|---|---|
20000-0000-00 | Product6_________________ | 1 | 0.00 | 0 |
100000-0000-00 | Product7_________________ | 10 | 158.04 | 0 |
20000-0000-00 | Product8_________________ | 1 | 0.00 | 0 |
100000-0000-00 | Product9_________________ | 10 | 158.04 | 0 |
The data you provided does not match the input statements. Please always run the code you have pasted before posting.
Here is a simplified version that does what you want:
data COST;
infile datalines truncover;
link skipheader; input @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;
link skipheader; input @29 TWP dollar8.2 ;
link skipheader; input @29 MAC dollar8.2 ;
return;
skipheader:
do until(NOHEADER);
input @;
if left(_infile_) in: ( ' '
'Rpt# '
'9/6/2018'
'/From'
'NDC'
'Pkg'
'OnHan'
'Supplie'
'QtySold'
'DE PTD'
'Equiva') then do; input; NOHEADER=0; end;
else NOHEADER=1;
end;
datalines;
20000-0000-00 Product6_________________ 1 0 0 0 0
TWP $3.00 Retail $0.00 Cost $0.00 Price Schedule tops Expire
MAC $4.00 WAC Retail Value $0.00 Cost Value $0.00
100000-0000-00 Product7_________________ N
Rpt# 100 PurchasingSummary Report Page 1
9/6/2018 Tops Inventory Report 8:36:07 PM
/From SEA: 0 to SEA: 2/Sort By Tops Name ,at location: Main
NDC Tops Type
Pkg
OnHan
Supplie
QtySold Order Code
DE PTD Order Qty
Equiva
TWP $158.04 Retail $0.00 Cost $0.00 Price Schedule 0 tops Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
run;
ABC | DRUG | PKG | TWP | MAC |
---|---|---|---|---|
20000-0000-00 | Product6_________________ | 1 | 3.00 | 4 |
100000-0000-0 | Product7________________ | . | 158.04 | 0 |
Hi ChrisNZ,
Wow, Thank you and very cool. Will have to spend some time with this learning what the "Link" statement does.
Your code with sample data inserted is attached. Sorry for the disconnect in the prior attachment.
There is one aspect that still doesn't work quite right and it's beyond my capabilities.
For example
"Product7_________________" has 7 lines of related data. There is a letter "N" (@47) just after it then the rows ends. PKG is in the next row , at position @60 PKG 4.0
"Product6_________________" has 6 lines of related data. Just after "Product6_________________" There is no letter, and
PKG is the same row , at position @60 PKG 4.0
In your Output (below), that PKG was not read into the SAS file for "Product7_________________". It does read it in for "Product6_________________"
Can that part be updated? Many thanks for your help with this,
R
The key is to check a row before anyinput.
Like this?
/**********************************************************************************************************/
data COST;
infile datalines truncover;
link skiprows; input @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;
if PKG =. then do;
link skiprows; input @60 PKG 4.0;
end;
link skiprows; input @29 TWP dollar8.2 ;
link skiprows; input @29 MAC dollar8.2 ;
return;
skiprows:
do until(^SKIPROW);
input @;
if left(_infile_) in: ( ''
'Min'
'0 '
'Rpt# '
'9/6/2018'
'/From'
'NDC'
'Pkg'
'OnHan'
'Supplie'
'QtySold'
'DE PTD'
'Equiva') then do; input; SKIPROW=1; end;
else SKIPROW=0;
end;
datalines;
Rpt# 100 Purchasing Summary Report Page 1
9/6/2018 Tops Inventory Report 8:36:07 PM
/From SEA: 0 to SEA: 2/Sort By Tops Name ,at location: Main
NDC Tops Type
Pkg
OnHan
Supplie
QtySold Order Code
DE PTD Order Qty
Equiva
20000-0000-00 Product6_________________ 1 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture
TWP $0.00 Retail $0.00 Cost $0.00 Price Schedule tops Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
100000-0000-00 Product7_________________ N
10 0 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture BAXA CORP.
TWP $158.04 Retail $0.00 Cost $0.00 Price Schedule 0 tops Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
20000-0000-00 Product8_________________ 1 0 0 0 0
Min Onhand 0 Max Onhand
0 Lot No Last Sold Manufacture
TWP $0.00 Retail $0.00 Cost $0.00 Price Schedule tops Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
100000-0000-00 Product9_________________ N
10 0 0 0 0 0
Min Onhand 0 Max Onhand
Rpt# 100 PurchasingSummary Report Page 1
9/6/2018 Tops Inventory Report 8:36:07 PM
/From SEA: 0 to SEA: 2/Sort By Tops Name ,at location: Main
NDC Tops Type
Pkg
OnHan
Supplie
QtySold Order Code
DE PTD Order Qty
Equiva
0 Lot No Last Sold Manufacture BAXA CORP.
TWP $158.04 Retail $0.00 Cost $0.00 Price Schedule 0 tops Expire
MAC $0.00 WAC Retail Value $0.00 Cost Value $0.00
run;
ABC | DRUG | PKG | TWP | MAC |
---|---|---|---|---|
20000-0000-00 | Product6_________________ | 1 | 0.00 | 0 |
100000-0000-00 | Product7_________________ | 10 | 158.04 | 0 |
20000-0000-00 | Product8_________________ | 1 | 0.00 | 0 |
100000-0000-00 | Product9_________________ | 10 | 158.04 | 0 |
ChrisNZ,
Thank you, this is very helpful and I've learned a lot
Robert
No that I know more about your problem, I would probably read the data differently.
I'd try to positively identify each useful record.
Something like:
data COST;
retain ABC DRUG PKG TWP ;
infile datalines truncover;
input @;
if length(_INFILE_);
if prxmatch('/\d{5}-\d{4}-\d{2}/',_INFILE_) then input @5 ABC $14. @20 DRUG $26. @60 PKG 4.0; *5 then 4 then 2 digits separated by -;
if prxmatch('/^\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+$/',_INFILE_) then input @60 PKG 4.0; *6 numbers;
if left(_INFILE_)=:'TWP' then input @29 TWP dollar8.2 ;
if left(_INFILE_)=:'MAC' then do; input @29 MAC dollar8.2; output; call missing (ABC, DRUG, PKG, TWP); end;
Hi Chris,
Again, many thanks.
This works prefectly, after making one minor change, related to a text pattern that was not present in the TXT sample file I uploaded here.
Changed one line as follows :
if prxmatch('/^\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+/',_INFILE_) then input @60 PKG_2 4.0; *begins with 5 numbers, then anything else to end of line ;
Have now learned how to use PERL characters,
Much appreciated,
Robert
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.