BookmarkSubscribeRSS Feed
Rucstat_huadli
Fluorite | Level 6
How can I read *.pdf documents using SAS!
7 REPLIES 7
Cynthia_sas
SAS Super FREQ
Hi!
This paper describes a process whereby you must first take a PDF file and turn it into an ASCII text file before you can read it with SAS. Since PDF is a proprietary format, the process he describes, makes sense. SAS creates PDF format files, it does not read them in their native, binary, format:
http://www8.sas.com/scholars/05/SESUG_05/Proceedings/2005/Serendipity/SER10_05.PDF

One other possibility is that you want to read the data that was collected in a PDF form (an FDF file or an XFDF file), as described in this paper:
http://www2.sas.com/proceedings/sugi27/p032-27.pdf

A third possibility involves printing the PDF document and then scanning it into OCR format, saving the file from the OCR scan and then reading -that- file with SAS (this is a variation of the first possibility).

Good luck!
cynthia
gfjump
Calcite | Level 5

there a variety of online pdf viewer vb.net on the web you can find to read pdf in full version.  you can also have all the processing features: zoom crop scale. most importanly you can convert pdf to various image formats. so it won't be a problem to read pdf now.

cathyhill
Calcite | Level 5

So the process of reading PDF doucment file is, in essence, the process of decoding PDF document to bitmap? By the way, witout using Adobe Acrobat PDF document reader, is there any free source code for us to use in order to view document in web application?

arronlee
Calcite | Level 5

Hi, Cathyhill.

I am using another PDF reader to help me read PDF documents instead of Adobe Acrobat PDF document reader. What's more, using code to deal with the related PDF documents reading problem is too complicated for me. So you can choose some manual toolkits which allows users to customize its features according to our own favors to help you with the related PDF documents reading problem. Remember to check its free trial package first if possible. I hope you success. Good luck.

Best regards,

Arron

Rucstat_huadli
Fluorite | Level 6
Thank u very much!!
mannimanoj
Calcite | Level 5

The easiest and fastest way by far is to use the full version of Adobe Acrobat.  Yes, it's expensive around $800 for the license but most companies will find at some point they need to edit PDFs.  You can also try on-line PDF to Excel converters (google it) but most only do a small number of pages.  There might be other cheaper PDF editors around.

So basically open the PDF in the full verion of Adobe Acrobat and then   File, Save as, select Excel.  Then from there it is plain sailing.  All the other methods I've looked in to are mega complicated and require lots of messing around.

jthy
Calcite | Level 5

I don't know which environment you are working in but if it is Windows you might find the PDF-text-extractor useful. There is a client based free version and there is a command line based version for USD 35. I went all in and invested the 35 dollars and built a routine that creates txt copies of all pdfs in a directory structure, thereby enabling the users to perform text-search and link back to the original pdf. Maybe that can serve as a starting point for you? Take a look at http://www.a-pdf.com/text/index.htm.

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

New Learning Events in April

 

Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.

LEARN MORE

Discussion stats
  • 7 replies
  • 24999 views
  • 1 like
  • 7 in conversation