- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I need to extract data from PDF documents. Is there a way to do it using some SAS procedure or SAS coding.
I saw a case where R was required. Unfortunately this is not an option for me my company would not allow use of this software.
I saw a module called SASĀ® Text Miner 14.2. It seems to handle PDF but I am not sure if it requires a separate license for it.
Does anyone know?
Thank you,
Marcelo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, text miner will be a licensed product, contact SAS for pricing.
Why is R not an option? Its free, and if it does the job use it.
In normal SAS, no, there is no simple way of reading a PDF. Extracting data from PDFs is a very complex and tricky process, and highly recommend to not go down that route. Return to the source data, or if that is not possible, requisition some data entry.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the quick response.
R is not available at my company. I will need an alternative solution.
Kind regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My work has all USB connections blocked. š
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
IT are not your enemy, there will be a way of getting the required software, just ask them. Much the same as you would need to get Adobe, or Text Miner or something else.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your help. In may case getting something out of the "official list" is discouraging. In any case will see what can be done. Kind regards
@RW9 wrote:IT are not your enemy, there will be a way of getting the required software, just ask them. Much the same as you would need to get Adobe, or Text Miner or something else.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ahh, then that is easy. Your response would be:
PDF is not a datasource, it is next to impossible to extract anything from it. Therefore there are three options:
1) Go back to source and get appropriate data
2) Assign/hire someone to data entry all the data from the pdf
3) Aquire tools to do such a task
Its up to your company which they choose, but saying none of those is possible, makes your end impossible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My work has all USB connections blocked too! š
@RW9 wrote:Put it on a pen drive, it can be portable:
https://sourceforge.net/projects/rportable/
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You need not, need to Install R in your PC. You can directly use it on cloud platform. See the following link, might be helpfull.
link: https://rstudio.cloud/
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Adobe Professional has the capability to transfer the text/data out and that's the easiest and most accurate method I've found. Besides using Nvivo or a text mining tool.
@marcelo_higasi wrote:
Hello,
I need to extract data from PDF documents. Is there a way to do it using some SAS procedure or SAS coding.
I saw a case where R was required. Unfortunately this is not an option for me my company would not allow use of this software.
I saw a module called SASĀ® Text Miner 14.2. It seems to handle PDF but I am not sure if it requires a separate license for it.
Does anyone know?
Thank you,
Marcelo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Using PDF professional seems like a possible solution. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As far as I am aware, the pdf conversion in Text miner is based on Apache Tika (https://tika.apache.org/) . I would consider these as a set of (Java based) programs which help in extracting data from a number of different document formats - pdfs, ppts, doc files etc.
You do not need to have Text Miner specifically to access Tika - if you explore your licences and happen to notice "Document Conversion Server" among your registered products - you may still be able to call the Tika program from the location / port where document conversion server is running.
In any case, you always have an option of installing and calling Tika from the command line interface. (It is a pretty lightweight utility)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Besides PDF files, there are several types of files that SAS can read in. You can see a full list at the following URL:
http://go.documentation.sas.com/?docsetId=tmref&docsetTarget=n1f1hnf1pk8w3in1i2h4v94rty2m.htm&docset...