BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jakestat
Obsidian | Level 7

Does anyone have experience using DOS command within SAS to convert .PDF files to .TXT files so that it can be read back into SAS?  I have heard that you have to put sas to "sleep" during the DOS command, then use an X statement.   Thank you for any help!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I don't know of any SAS resource to do this, which doesn't mean it doesn't exist, but here's a blog post I recently came across that covers some tools that do:

Tools for Extracting Data From PDFs — Scott Murray — alignedleft

View solution in original post

8 REPLIES 8
Reeza
Super User

I don't know of any SAS resource to do this, which doesn't mean it doesn't exist, but here's a blog post I recently came across that covers some tools that do:

Tools for Extracting Data From PDFs — Scott Murray — alignedleft

jakestat
Obsidian | Level 7

I was hoping for a SAS exacutable program that runs start to finish, with libname pointing to the .pdf's in question, executing a conversion, then (part 2), pulling the text items into a SAS dataset. Part 2 is managable.   Because of the restricted invironment, no outside software is allowed.  

Reeza
Super User

If you're in an enterprise environment you're more likely to have access to Adobe Professional though. What does your PDF look like?

Adobe has some scripting tools that allow you to batch process something things relatively painlessly. It helps if you know some javascript though.

jakestat
Obsidian | Level 7

I have code to pull the PDF from the following website.  It seems the PDF was created directly from excel. 

http://www.stearnsdhialab.com/css/auctions/Sep1Hay.pdf

Reeza
Super User

If it was me:

1. Batch download all files

2. Use Adobe Professional to save as Excel file or XML, which it does nicely

3. Use SAS to extract information from Excel files.

art297
Opal | Level 21

Depends upon what program you are calling in DOS and how they have to interact with SAS. I've had extremely good success with the products from: Batch extract PDF Form Data. [A-PDF.com] and I've been able to put the calls in the process flow without having to forcing SAS to sleep.

art297
Opal | Level 21

Jake: I took a closer look at the files you are trying to download and I doubt if any pdf converter would know how to correctly convert the second page of each of the pdfs on that site.

i.e., one could easily write vb script (to run have SAS run) that (1) opened Adobe Reader; (2) did a select all (i.e., ctrl-A); (3) copied the text to your system's notepad (ctrl-C); (4) opened notepad; (5) pasted the clipbrd to notepad (ctrl-V); went back to Adobe and selected the next page (down arrow); repeated the copy/paste steps; (6) saved the notepad file; and (7) had sas open the txt file that was created and parsed its contents.

The first 75% of the file would be easy to parse as all of the desired text starts with the headers:

Auction Date: September 04, 2014

LOT NO. SAMPLE DESCRIPTION MOISTURE PROTEIN RFV CUTTING LOAD SIZE PRICE

and the data that follows the headers is rather straight forward:

869 Large Round 14.96 20.48 82.78 1 15.48 75.00

However, the last approximately 25% of the file didn't make sense to me given the header variables:

872 Medium Square STRAW 78 Bales $ 2 5.00

If those latter lines are all irrelevant, then the problem would be easy to solve.

longwest
Calcite | Level 5

I've only have experience on extracting text from PDF or converting PDF to Word for getting text, But I've no idea on converting PDF to TXT directly.

I'm also looking forward to learn a solution for it.

Any other ideas?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 4428 views
  • 6 likes
  • 4 in conversation