BookmarkSubscribeRSS Feed
Defense
Obsidian | Level 7

I have 500 PDF  files need to convert to SAS datasets. I found there are some complex codes show to do one convert (one pdf file to one  sas datsset) . This code is prety complex and one do one convert. Is there any simple proc. to do it in a simple way and can quickly convert  these 500 pdf files?

 

Thanks

 

 

 

7 REPLIES 7
Reeza
Super User

No.

PDF files are not easily readable by any system 😞

 

EDIT:

To clarify there's no simple proc. Your best bet is as indicated to save data to a text or machine readable file. 

Personally, I would purchase a one month subscription to Adobe and use Adobe Pro to convert it. If you have Adobe Professional, most big corps do, you can batch process all 500 in a script. Adobe has an Automator feature that works well IMO. 

Defense
Obsidian | Level 7

I have 500 PDF  files need to convert to SAS datasets. I found there are some complex codes show to do one convert (one pdf file to one  sas datsset) . This code is prety complex and one do one convert. Is there any simple proc. to do it in a simple way and can quickly convert  these 500 pdf files?

 

Thanks

 

 

 

AhmedAl_Attar
Ammonite | Level 13

You may want to look into the option of converting the PDF into a File format that can be accessed by SAS, such as Excel!?

Here is a link with such option: https://wagda.lib.washington.edu/gishelp/tutorial/excel.html

 

Hope this helps,

Ahmed

Patrick
Opal | Level 21

To add to what @AhmedAl_Attar posted:

PDF files as such are not "tabular" so there is not really a direct conversion path. Tika would allow you to convert your PDF into a text based document (done that myself, works really well and is simple to use) which you then could read into SAS.

There is also Apache PDFBox which apparently can do PDF to csv conversions - never used it though.

 

https://tika.apache.org/ 

http://pdfbox.apache.org/ 

Kurt_Bremser
Super User

Such a conversion will only make sense if the PDFs in question contain usable data. Since a PDF could also be one big graphical image (like a scan), it is one of the least suited formats for business intelligence data transfer.

I'd rather request the originator to provide data in a format that makes sense. And provide metadata (column descriptions) along.

ballardw
Super User

If the issue has to do with PDF fillable forms and the data contained therein then use a proper PDF tool like Adobe Acrobat Pro to export the data. That will usually result in some form of set that can be imported to SAS.

Defense
Obsidian | Level 7
This is the way I decide to do with my data.

1. Convert PDF to excel using Adobe Acrobat Professional version, which allows me to convert hundreds pdf to excel just by one “click”

2. Read excel to sas using a macro

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 8461 views
  • 1 like
  • 6 in conversation