Hello,
I need to extract data from PDF documents. Is there a way to do it using some SAS procedure or SAS coding.
I saw a case where R was required. Unfortunately this is not an option for me my company would not allow use of this software.
I saw a module called SAS® Text Miner 14.2. It seems to handle PDF but I am not sure if it requires a separate license for it.
Does anyone know?
Thank you,
Marcelo
Yes, text miner will be a licensed product, contact SAS for pricing.
Why is R not an option? Its free, and if it does the job use it.
In normal SAS, no, there is no simple way of reading a PDF. Extracting data from PDFs is a very complex and tricky process, and highly recommend to not go down that route. Return to the source data, or if that is not possible, requisition some data entry.
Thank you for the quick response.
R is not available at my company. I will need an alternative solution.
Kind regards
My work has all USB connections blocked. 😞
IT are not your enemy, there will be a way of getting the required software, just ask them. Much the same as you would need to get Adobe, or Text Miner or something else.
Thank you for your help. In may case getting something out of the "official list" is discouraging. In any case will see what can be done. Kind regards
@RW9 wrote:IT are not your enemy, there will be a way of getting the required software, just ask them. Much the same as you would need to get Adobe, or Text Miner or something else.
Ahh, then that is easy. Your response would be:
PDF is not a datasource, it is next to impossible to extract anything from it. Therefore there are three options:
1) Go back to source and get appropriate data
2) Assign/hire someone to data entry all the data from the pdf
3) Aquire tools to do such a task
Its up to your company which they choose, but saying none of those is possible, makes your end impossible.
My work has all USB connections blocked too! 😞
@RW9 wrote:Put it on a pen drive, it can be portable:
https://sourceforge.net/projects/rportable/
You need not, need to Install R in your PC. You can directly use it on cloud platform. See the following link, might be helpfull.
link: https://rstudio.cloud/
Adobe Professional has the capability to transfer the text/data out and that's the easiest and most accurate method I've found. Besides using Nvivo or a text mining tool.
@marcelo_higasi wrote:
Hello,
I need to extract data from PDF documents. Is there a way to do it using some SAS procedure or SAS coding.
I saw a case where R was required. Unfortunately this is not an option for me my company would not allow use of this software.
I saw a module called SAS® Text Miner 14.2. It seems to handle PDF but I am not sure if it requires a separate license for it.
Does anyone know?
Thank you,
Marcelo
Using PDF professional seems like a possible solution. Thank you!
Besides PDF files, there are several types of files that SAS can read in. You can see a full list at the following URL:
http://go.documentation.sas.com/?docsetId=tmref&docsetTarget=n1f1hnf1pk8w3in1i2h4v94rty2m.htm&docset...
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.