BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rmacarthur
Pyrite | Level 9

Hello Again SAS Friends, 

 

This is a followup to an earlier question.

 

Need to import a .TXT file that has a header that appears irregularly.  The header appears at the top of the first page, and then reappears after 87, or 88, 89, 90, or more rows.  Sometimes the header is inserted into the data subunits (see below), and splits them into two.

 

The basic data subunit of interest is either a 6 or 7 rows.  The first component of the subunit is a 12 digit number #####-####-##, followed by other variables that are connected to that number. 

 

The code to capture the data from the subunits is below.  What is not included is code to work around the unwanted header information.

 

Attached is a sample of the .TXT file with the header inserted into inconvenient places to demonstrate the TXT file structure.  The actual dataset is ~17000 pages, so is not attached.

 

 

One option tried was to import the entire .TXT file so that there is a single variable and one row for each row of the TXT file.  However I could not get all the text (characters and spaces) from one row into a single variable field.  If that is possible, then I think SAS character editing can be used to take care of the rest of the import.

 

Or if its possible to somehow remove the header information , that would work too.  Should I parse this in another program?

 

////////////////////////////////////////////////////

DATA Cost;
infile datalines truncover; 
INPUT 	@5 ABC $14. @20 DRUG $26. @60 PKG 4.0;
if missing(PKG) then 
    input		
          @60 	PKG 4.0				
		/ @1  	NN_3 $1.	
		/ @1  	NN_4 $1.	
		/ @29 	AWP DOLLAR8.2			
		/ @1  	NN_6 $1.	 
		/ @1  	NN_7 $12;  
else 
    input 	
		  @1 	NN_2 $1. 
		/ @1  	NN_3 $1.	
		/ @1  	NN_4 $1.	
		/ @29 	AWP DOLLAR8.2			
		/ @1  	NN_6 $1.	 
;  
FORMAT	AWP	Dollar8.2	;	
datalines;     20000-0000-00  Product6_________________               1            0                    0                0                     0
                    Min Onhand      0         Max Onhand
                                                               0           Lot No            Last Sold                 Manufacture
                    AWP     $0.00             Retail    $0.00              Cost       $0.00  Price Schedule            Drug Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00

    100000-0000-00  Product7_________________ N
                                                            10            0                    0                0    0                0
                    Min Onhand      0         Max Onhand
                                                               0           Lot No            Last Sold                 Manufacture       BAXA CORP.
                    AWP     $158.04           Retail    $0.00              Cost       $0.00  Price Schedule    0       Drug Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00

;

 

Thanks

R

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

The key is to check a row before anyinput.

Like this?


/**********************************************************************************************************/
data COST;
  infile datalines truncover; 
  link skiprows; input @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;   
  if PKG =. then do;
    link skiprows; input @60 PKG 4.0;            
  end; 
  link skiprows; input @29 TWP dollar8.2  ;                 
  link skiprows; input @29 MAC dollar8.2  ;                 
return;

skiprows:
  do until(^SKIPROW);
    input @;                          
    if left(_infile_) in: ( ''
                            'Min'
                            '0 '
                            'Rpt# '         
                            '9/6/2018'
                            '/From'
                            'NDC'
                            'Pkg'
                            'OnHan'
                            'Supplie'
                            'QtySold'
                            'DE PTD'
                            'Equiva') then do; input; SKIPROW=1; end;
    else SKIPROW=0;       
  end;
datalines;    
            
Rpt# 100                                                      Purchasing Summary Report                                                         Page     1

9/6/2018                                                        Tops Inventory Report                                                           8:36:07 PM

                                             /From SEA: 0 to SEA: 2/Sort By Tops Name ,at location:  Main


                                                       
              NDC                    Tops          Type
                                                                  
                                                              Pkg
                                                                              
                                                                         OnHan
                                                                                                     
                                                                                              Supplie
                                                                                                         QtySold     Order Code
                                                                                                      DE PTD                     Order Qty                   
                                                                                                                                                       Equiva



     20000-0000-00  Product6_________________               1            0                    0                0                     0
                    Min Onhand      0         Max Onhand
                                                               0           Lot No            Last Sold                 Manufacture
                    TWP     $0.00             Retail    $0.00              Cost       $0.00  Price Schedule            tops Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00

    100000-0000-00  Product7_________________ N
                                                            10            0                    0                0    0                0
                    Min Onhand      0         Max Onhand
                                                               0           Lot No            Last Sold                 Manufacture       BAXA CORP.
                    TWP     $158.04           Retail    $0.00              Cost       $0.00  Price Schedule    0       tops Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00

     20000-0000-00  Product8_________________               1            0                    0                0                     0
                    Min Onhand      0         Max Onhand
                                                               0           Lot No            Last Sold                 Manufacture
                    TWP     $0.00             Retail    $0.00              Cost       $0.00  Price Schedule            tops Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00

    100000-0000-00  Product9_________________ N
                                                            10            0                    0                0    0                0
                    Min Onhand      0         Max Onhand


Rpt# 100                                                       PurchasingSummary Report                                                         Page     1

9/6/2018                                                        Tops Inventory Report                                                           8:36:07 PM

                                             /From SEA: 0 to SEA: 2/Sort By Tops Name ,at location:  Main


                                                       
              NDC                    Tops          Type
                                                                  
                                                              Pkg
                                                                              
                                                                         OnHan
                                                                                                     
                                                                                              Supplie
                                                                                                         QtySold     Order Code
                                                                                                      DE PTD                     Order Qty                   
                                                                                                                                                       Equiva

                                                               0           Lot No            Last Sold                 Manufacture       BAXA CORP.
                    TWP     $158.04           Retail    $0.00              Cost       $0.00  Price Schedule    0       tops Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00

run;



ABC DRUG PKG TWP MAC
20000-0000-00 Product6_________________ 1 0.00 0
100000-0000-00 Product7_________________ 10 158.04 0
20000-0000-00 Product8_________________ 1 0.00 0
100000-0000-00 Product9_________________ 10 158.04 0

 

 

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

The data you provided does not match the input statements. Please always run the code you have pasted before posting.

Here is a simplified version that does what you want:

data COST;
  infile datalines truncover; 
  link skipheader; input  @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;  
  link skipheader; input @29 TWP dollar8.2  ;                 
  link skipheader; input @29 MAC dollar8.2  ;                 
return;

skipheader:
  do until(NOHEADER);
    input @;
    if left(_infile_) in: ( ' '
                            'Rpt# '         
                            '9/6/2018'
                            '/From'
                            'NDC'
                            'Pkg'
                            'OnHan'
                            'Supplie'
                            'QtySold'
                            'DE PTD'
                            'Equiva') then do; input; NOHEADER=0; end;
    else NOHEADER=1;
  end;
datalines;    
     20000-0000-00  Product6_________________               1            0                    0                0                     0
                    TWP     $3.00             Retail    $0.00              Cost       $0.00  Price Schedule            tops Expire
                    MAC     $4.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00
     100000-0000-00  Product7_________________ N

Rpt# 100                                                       PurchasingSummary Report                                                         Page     1

9/6/2018                                                        Tops Inventory Report                                                           8:36:07 PM

                                             /From SEA: 0 to SEA: 2/Sort By Tops Name ,at location:  Main


                                                       
              NDC                    Tops          Type
                                                                  
                                                              Pkg
                                                                              
                                                                         OnHan
                                                                                                     
                                                                                              Supplie
                                                                                                         QtySold     Order Code
                                                                                                      DE PTD                     Order Qty                   
                                                                                                                                                       Equiva

                    TWP     $158.04           Retail    $0.00              Cost       $0.00  Price Schedule    0       tops Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00
run;



ABC DRUG PKG TWP MAC
20000-0000-00 Product6_________________ 1 3.00 4
100000-0000-0 Product7________________ . 158.04 0

 

 

rmacarthur
Pyrite | Level 9

Hi ChrisNZ,

 

Wow, Thank you and very cool.  Will have to spend some time with this learning what the "Link" statement does.  

Your code with sample data inserted is attached.  Sorry for the disconnect in the prior attachment.

There is one aspect that still doesn't work quite right and it's beyond my capabilities.

 

For example

"Product7_________________" has 7 lines of related data.  There is a letter "N" (@47) just after it  then the rows ends.  PKG is in the next row , at position @60 PKG 4.0

 

"Product6_________________" has 6 lines of related data.  Just after "Product6_________________" There is no letter, and 

PKG is the same row , at position @60 PKG 4.0 

 

In your Output (below), that PKG was not read into the SAS file for "Product7_________________".  It does read it in for "Product6_________________"

 

Can that part be updated?  Many thanks for your help with this, 

R

 

 

 

 

ChrisNZ
Tourmaline | Level 20

The key is to check a row before anyinput.

Like this?


/**********************************************************************************************************/
data COST;
  infile datalines truncover; 
  link skiprows; input @5 ABC $14. @20 DRUG $26. @60 PKG 4.0;   
  if PKG =. then do;
    link skiprows; input @60 PKG 4.0;            
  end; 
  link skiprows; input @29 TWP dollar8.2  ;                 
  link skiprows; input @29 MAC dollar8.2  ;                 
return;

skiprows:
  do until(^SKIPROW);
    input @;                          
    if left(_infile_) in: ( ''
                            'Min'
                            '0 '
                            'Rpt# '         
                            '9/6/2018'
                            '/From'
                            'NDC'
                            'Pkg'
                            'OnHan'
                            'Supplie'
                            'QtySold'
                            'DE PTD'
                            'Equiva') then do; input; SKIPROW=1; end;
    else SKIPROW=0;       
  end;
datalines;    
            
Rpt# 100                                                      Purchasing Summary Report                                                         Page     1

9/6/2018                                                        Tops Inventory Report                                                           8:36:07 PM

                                             /From SEA: 0 to SEA: 2/Sort By Tops Name ,at location:  Main


                                                       
              NDC                    Tops          Type
                                                                  
                                                              Pkg
                                                                              
                                                                         OnHan
                                                                                                     
                                                                                              Supplie
                                                                                                         QtySold     Order Code
                                                                                                      DE PTD                     Order Qty                   
                                                                                                                                                       Equiva



     20000-0000-00  Product6_________________               1            0                    0                0                     0
                    Min Onhand      0         Max Onhand
                                                               0           Lot No            Last Sold                 Manufacture
                    TWP     $0.00             Retail    $0.00              Cost       $0.00  Price Schedule            tops Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00

    100000-0000-00  Product7_________________ N
                                                            10            0                    0                0    0                0
                    Min Onhand      0         Max Onhand
                                                               0           Lot No            Last Sold                 Manufacture       BAXA CORP.
                    TWP     $158.04           Retail    $0.00              Cost       $0.00  Price Schedule    0       tops Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00

     20000-0000-00  Product8_________________               1            0                    0                0                     0
                    Min Onhand      0         Max Onhand
                                                               0           Lot No            Last Sold                 Manufacture
                    TWP     $0.00             Retail    $0.00              Cost       $0.00  Price Schedule            tops Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00

    100000-0000-00  Product9_________________ N
                                                            10            0                    0                0    0                0
                    Min Onhand      0         Max Onhand


Rpt# 100                                                       PurchasingSummary Report                                                         Page     1

9/6/2018                                                        Tops Inventory Report                                                           8:36:07 PM

                                             /From SEA: 0 to SEA: 2/Sort By Tops Name ,at location:  Main


                                                       
              NDC                    Tops          Type
                                                                  
                                                              Pkg
                                                                              
                                                                         OnHan
                                                                                                     
                                                                                              Supplie
                                                                                                         QtySold     Order Code
                                                                                                      DE PTD                     Order Qty                   
                                                                                                                                                       Equiva

                                                               0           Lot No            Last Sold                 Manufacture       BAXA CORP.
                    TWP     $158.04           Retail    $0.00              Cost       $0.00  Price Schedule    0       tops Expire
                    MAC     $0.00             WAC                          Retail Value      $0.00                     Cost Value        $0.00

run;



ABC DRUG PKG TWP MAC
20000-0000-00 Product6_________________ 1 0.00 0
100000-0000-00 Product7_________________ 10 158.04 0
20000-0000-00 Product8_________________ 1 0.00 0
100000-0000-00 Product9_________________ 10 158.04 0

 

 

rmacarthur
Pyrite | Level 9

ChrisNZ, 

Thank you, this is very helpful and I've learned a lot 

Robert

ChrisNZ
Tourmaline | Level 20

No that I know more about your problem, I would probably read the data differently.

I'd try to positively identify each useful record.

Something like:

data COST;
  retain ABC DRUG PKG TWP ;
  infile datalines truncover; 
  input @;
  if length(_INFILE_);
  if prxmatch('/\d{5}-\d{4}-\d{2}/',_INFILE_) then input @5 ABC $14. @20 DRUG $26. @60 PKG 4.0; *5 then 4 then 2 digits separated by -;
  if prxmatch('/^\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+$/',_INFILE_) then input @60 PKG 4.0; *6 numbers;           
  if left(_INFILE_)=:'TWP' then input @29 TWP dollar8.2  ;                 
  if left(_INFILE_)=:'MAC' then do; input @29 MAC dollar8.2; output; call missing (ABC, DRUG, PKG, TWP); end;                
rmacarthur
Pyrite | Level 9

Hi Chris, 

Again, many thanks.

This works prefectly, after making one minor change, related to a text pattern that was not present in the TXT sample file I uploaded here.

Changed one line as follows :

 

      if prxmatch('/^\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+/',_INFILE_) then input @60 PKG_2 4.0; *begins with 5 numbers, then anything else to end of line ;          

 

Have now learned how to use PERL characters, 

 

Much appreciated, 

Robert

 

 

 

rmacarthur
Pyrite | Level 9
Very interesting , thank you again , will try that out later today and get back to you . Makes perfect sense .

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1881 views
  • 0 likes
  • 2 in conversation