SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jakestat
Obsidian | Level 7

Does anyone have experience using DOS command within SAS to convert .PDF files to .TXT files so that it can be read back into SAS?  I have heard that you have to put sas to "sleep" during the DOS command, then use an X statement.   Thank you for any help!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I don't know of any SAS resource to do this, which doesn't mean it doesn't exist, but here's a blog post I recently came across that covers some tools that do:

Tools for Extracting Data From PDFs — Scott Murray — alignedleft

View solution in original post

8 REPLIES 8
Reeza
Super User

I don't know of any SAS resource to do this, which doesn't mean it doesn't exist, but here's a blog post I recently came across that covers some tools that do:

Tools for Extracting Data From PDFs — Scott Murray — alignedleft

jakestat
Obsidian | Level 7

I was hoping for a SAS exacutable program that runs start to finish, with libname pointing to the .pdf's in question, executing a conversion, then (part 2), pulling the text items into a SAS dataset. Part 2 is managable.   Because of the restricted invironment, no outside software is allowed.  

Reeza
Super User

If you're in an enterprise environment you're more likely to have access to Adobe Professional though. What does your PDF look like?

Adobe has some scripting tools that allow you to batch process something things relatively painlessly. It helps if you know some javascript though.

jakestat
Obsidian | Level 7

I have code to pull the PDF from the following website.  It seems the PDF was created directly from excel. 

http://www.stearnsdhialab.com/css/auctions/Sep1Hay.pdf

Reeza
Super User

If it was me:

1. Batch download all files

2. Use Adobe Professional to save as Excel file or XML, which it does nicely

3. Use SAS to extract information from Excel files.

art297
Opal | Level 21

Depends upon what program you are calling in DOS and how they have to interact with SAS. I've had extremely good success with the products from: Batch extract PDF Form Data. [A-PDF.com] and I've been able to put the calls in the process flow without having to forcing SAS to sleep.

art297
Opal | Level 21

Jake: I took a closer look at the files you are trying to download and I doubt if any pdf converter would know how to correctly convert the second page of each of the pdfs on that site.

i.e., one could easily write vb script (to run have SAS run) that (1) opened Adobe Reader; (2) did a select all (i.e., ctrl-A); (3) copied the text to your system's notepad (ctrl-C); (4) opened notepad; (5) pasted the clipbrd to notepad (ctrl-V); went back to Adobe and selected the next page (down arrow); repeated the copy/paste steps; (6) saved the notepad file; and (7) had sas open the txt file that was created and parsed its contents.

The first 75% of the file would be easy to parse as all of the desired text starts with the headers:

Auction Date: September 04, 2014

LOT NO. SAMPLE DESCRIPTION MOISTURE PROTEIN RFV CUTTING LOAD SIZE PRICE

and the data that follows the headers is rather straight forward:

869 Large Round 14.96 20.48 82.78 1 15.48 75.00

However, the last approximately 25% of the file didn't make sense to me given the header variables:

872 Medium Square STRAW 78 Bales $ 2 5.00

If those latter lines are all irrelevant, then the problem would be easy to solve.

longwest
Calcite | Level 5

I've only have experience on extracting text from PDF or converting PDF to Word for getting text, But I've no idea on converting PDF to TXT directly.

I'm also looking forward to learn a solution for it.

Any other ideas?

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 5843 views
  • 6 likes
  • 4 in conversation