BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
saswiki
Obsidian | Level 7

Hi Experts, 

 

I have a requirement where I have to read a PDF file in SAS; the PDF file is having screenshot of some bills. Can we import this data in SAS as image; If this is not possible, can anyone tell me how to read a readable table format data stored as PDF in SAS. 

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
Cynthia_sas
SAS Super FREQ

Hi:

  You might be able to do something like find a program that would translate the document to Word format or plain text format. The challenge that I think you'll run into is that PDF is a proprietary file type. My understanding was that it was stored in binary form. For example, if I create a little table in a PDF file -- just a table, no other text -- from the first 5 rows of SASHELP.CLASS, and then try to open the PDF file in Notepad, it is unreadable, although in Adobe Reader, it is clearly just a little table:

Cynthia_sas_0-1673618692265.png

The XML metadata describing the document does not contain any of the table, but the table is stored in the unreadable section at the top of the file. So I think your task is going to be hard without looking for 3rd party tools to translate the PDF into text.

Cynthia

View solution in original post

7 REPLIES 7
Cynthia_sas
SAS Super FREQ

Hi:

  You might be able to do something like find a program that would translate the document to Word format or plain text format. The challenge that I think you'll run into is that PDF is a proprietary file type. My understanding was that it was stored in binary form. For example, if I create a little table in a PDF file -- just a table, no other text -- from the first 5 rows of SASHELP.CLASS, and then try to open the PDF file in Notepad, it is unreadable, although in Adobe Reader, it is clearly just a little table:

Cynthia_sas_0-1673618692265.png

The XML metadata describing the document does not contain any of the table, but the table is stored in the unreadable section at the top of the file. So I think your task is going to be hard without looking for 3rd party tools to translate the PDF into text.

Cynthia

saswiki
Obsidian | Level 7
Thanks Cynthia; I understood that we cant directly import the images into SAS. The requirement is to import the PDF which has screenshots and load the screenshots into Azure. I think we should have to use VBA or other tools; Thanks for your time on this.
Quentin
Super User

I would look to other languages (e.g. python) for this sort of task.  But surprisingly (to me) there is a paper that uses SAS to read a pdf file:

https://support.sas.com/resources/papers/proceedings16/9320-2016.pdf

 

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Tom
Super User Tom
Super User

@saswiki wrote:

Hi Experts, 

 

I have a requirement where I have to read a PDF file in SAS; the PDF file is having screenshot of some bills. Can we import this data in SAS as image; If this is not possible, can anyone tell me how to read a readable table format data stored as PDF in SAS. 

 

Thanks

 


Do you needed take the PDF file apart and show how pick out the image?  or do you just need to store the PDF file.

 

Either way it is not something you would store in a SAS dataset.  Store the files into a file system.  You could then store a filename or a URL path to the file into a character variable.

 

An image is not something that Foundation SAS would have any tools to deal with.  

saswiki
Obsidian | Level 7
I understood that I am not able to read the image stored in the PDF file. Do we have any option if the PDF is having a table format data, can we import the PDF file into SAS?
AlanC
Barite | Level 11
Ok, do not use VBA for anything. I am converting thousands of lines of VBA due to age and the fact that it is deprecated. No need to go backwards.

I read pdf using C# and GemBox software. You should be able to find a nuget package that can read it using C#. Python should also be able to do it.
You can also use an RPA tool like UiPath or PowerAutomate.

There are also tools that can export PDF to HTML.



https://github.com/savian-net
srk_2023
Calcite | Level 5

1)First read the pdf file as data file via excel, feeding the page which you need

2)This creates a tab in excel with structured data

3)then you can read in sas (Suitable for not many tables)

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3375 views
  • 0 likes
  • 6 in conversation