Convert PDF to TXT

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

Convert PDF to TXT

Does anyone have experience using DOS command within SAS to convert .PDF files to .TXT files so that it can be read back into SAS?  I have heard that you have to put sas to "sleep" during the DOS command, then use an X statement.   Thank you for any help!


Accepted Solutions
Solution
‎10-15-2014 04:01 PM
Super User
Posts: 19,772

Re: Convert PDF to TXT

I don't know of any SAS resource to do this, which doesn't mean it doesn't exist, but here's a blog post I recently came across that covers some tools that do:

Tools for Extracting Data From PDFs — Scott Murray — alignedleft

View solution in original post


All Replies
Solution
‎10-15-2014 04:01 PM
Super User
Posts: 19,772

Re: Convert PDF to TXT

I don't know of any SAS resource to do this, which doesn't mean it doesn't exist, but here's a blog post I recently came across that covers some tools that do:

Tools for Extracting Data From PDFs — Scott Murray — alignedleft

Contributor
Posts: 30

Re: Convert PDF to TXT

I was hoping for a SAS exacutable program that runs start to finish, with libname pointing to the .pdf's in question, executing a conversion, then (part 2), pulling the text items into a SAS dataset. Part 2 is managable.   Because of the restricted invironment, no outside software is allowed.  

Super User
Posts: 19,772

Re: Convert PDF to TXT

If you're in an enterprise environment you're more likely to have access to Adobe Professional though. What does your PDF look like?

Adobe has some scripting tools that allow you to batch process something things relatively painlessly. It helps if you know some javascript though.

Contributor
Posts: 30

Re: Convert PDF to TXT

I have code to pull the PDF from the following website.  It seems the PDF was created directly from excel. 

http://www.stearnsdhialab.com/css/auctions/Sep1Hay.pdf

Super User
Posts: 19,772

Re: Convert PDF to TXT

If it was me:

1. Batch download all files

2. Use Adobe Professional to save as Excel file or XML, which it does nicely

3. Use SAS to extract information from Excel files.

PROC Star
Posts: 7,468

Re: Convert PDF to TXT

Depends upon what program you are calling in DOS and how they have to interact with SAS. I've had extremely good success with the products from: Batch extract PDF Form Data. [A-PDF.com] and I've been able to put the calls in the process flow without having to forcing SAS to sleep.

PROC Star
Posts: 7,468

Re: Convert PDF to TXT

Jake: I took a closer look at the files you are trying to download and I doubt if any pdf converter would know how to correctly convert the second page of each of the pdfs on that site.

i.e., one could easily write vb script (to run have SAS run) that (1) opened Adobe Reader; (2) did a select all (i.e., ctrl-A); (3) copied the text to your system's notepad (ctrl-C); (4) opened notepad; (5) pasted the clipbrd to notepad (ctrl-V); went back to Adobe and selected the next page (down arrow); repeated the copy/paste steps; (6) saved the notepad file; and (7) had sas open the txt file that was created and parsed its contents.

The first 75% of the file would be easy to parse as all of the desired text starts with the headers:

Auction Date: September 04, 2014

LOT NO. SAMPLE DESCRIPTION MOISTURE PROTEIN RFV CUTTING LOAD SIZE PRICE

and the data that follows the headers is rather straight forward:

869 Large Round 14.96 20.48 82.78 1 15.48 75.00

However, the last approximately 25% of the file didn't make sense to me given the header variables:

872 Medium Square STRAW 78 Bales $ 2 5.00

If those latter lines are all irrelevant, then the problem would be easy to solve.

Occasional Contributor
Posts: 7

Re: Convert PDF to TXT

I've only have experience on extracting text from PDF or converting PDF to Word for getting text, But I've no idea on converting PDF to TXT directly.

I'm also looking forward to learn a solution for it.

Any other ideas?

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 1340 views
  • 6 likes
  • 4 in conversation