Hi Experts,
I have a requirement where I have to read a PDF file in SAS; the PDF file is having screenshot of some bills. Can we import this data in SAS as image; If this is not possible, can anyone tell me how to read a readable table format data stored as PDF in SAS.
Thanks
Hi:
You might be able to do something like find a program that would translate the document to Word format or plain text format. The challenge that I think you'll run into is that PDF is a proprietary file type. My understanding was that it was stored in binary form. For example, if I create a little table in a PDF file -- just a table, no other text -- from the first 5 rows of SASHELP.CLASS, and then try to open the PDF file in Notepad, it is unreadable, although in Adobe Reader, it is clearly just a little table:
The XML metadata describing the document does not contain any of the table, but the table is stored in the unreadable section at the top of the file. So I think your task is going to be hard without looking for 3rd party tools to translate the PDF into text.
Cynthia
Hi:
You might be able to do something like find a program that would translate the document to Word format or plain text format. The challenge that I think you'll run into is that PDF is a proprietary file type. My understanding was that it was stored in binary form. For example, if I create a little table in a PDF file -- just a table, no other text -- from the first 5 rows of SASHELP.CLASS, and then try to open the PDF file in Notepad, it is unreadable, although in Adobe Reader, it is clearly just a little table:
The XML metadata describing the document does not contain any of the table, but the table is stored in the unreadable section at the top of the file. So I think your task is going to be hard without looking for 3rd party tools to translate the PDF into text.
Cynthia
I would look to other languages (e.g. python) for this sort of task. But surprisingly (to me) there is a paper that uses SAS to read a pdf file:
https://support.sas.com/resources/papers/proceedings16/9320-2016.pdf
@saswiki wrote:
Hi Experts,
I have a requirement where I have to read a PDF file in SAS; the PDF file is having screenshot of some bills. Can we import this data in SAS as image; If this is not possible, can anyone tell me how to read a readable table format data stored as PDF in SAS.
Thanks
Do you needed take the PDF file apart and show how pick out the image? or do you just need to store the PDF file.
Either way it is not something you would store in a SAS dataset. Store the files into a file system. You could then store a filename or a URL path to the file into a character variable.
An image is not something that Foundation SAS would have any tools to deal with.
1)First read the pdf file as data file via excel, feeding the page which you need
2)This creates a tab in excel with structured data
3)then you can read in sas (Suitable for not many tables)
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.