BookmarkSubscribeRSS Feed
SASUserRocks
Calcite | Level 5

Dear Friends

 

Is it possible to read contents from signed scanned pdf copy to SAS proc sql. Please help

4 REPLIES 4
Patrick
Opal | Level 21

You need first to convert your PDF image to text. You then can parse this text using a SAS data step to extract what you need and store it in a SAS table. You need a table to use SQL.

 

Tika is opensource software that can do such conversion to text. I believe SAS uses Tika (or at least used to) for this as part of Text Analytics.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

PDF is a proprietary binary file.  SAS cannot read much else other than the plain binary code which unless you understand and can parse it, is of no use.  PDF is not a data format, its a render destination.  Whilst there are certain packages in python/R and other languages languages to do certain things, even if you do manage to get anything useful out of it would require full QC and probably a lot of work/processing.  I would highly recommend either returning to source data which is the preferred method, or worst case, type things in manually.  Ultimately PDF is a dreadful format, as far from open as possible, so avoid it as much as possible.

SASUserRocks
Calcite | Level 5

Thanks for the feedback provided. Lets say i can convert pdf to text. Here how can bring check box tick in txt file.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Exactly.  How can you bring in things which aren't text.  Checkbox could be a PDF object, or a picture.  End of the day, you could learn javascript in PDF and export it, or possibly find a third party library to get it (perhaps at a quick search: https://stackoverflow.com/questions/55777812/python-pdf-how-to-read-from-form-elements-like-checkbox), but its going to take a fair bit of effort.  Go back to source data is really the best option.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1283 views
  • 0 likes
  • 3 in conversation