BookmarkSubscribeRSS Feed
Defense
Obsidian | Level 7

I have 500 PDF  files need to convert to SAS datasets. I found there are some complex codes show to do one convert (one pdf file to one  sas datsset) . This code is prety complex and one do one convert. Is there any simple proc. to do it in a simple way and can quickly convert  these 500 pdf files?

 

Thanks

 

 

 

7 REPLIES 7
Reeza
Super User

No.

PDF files are not easily readable by any system 😞

 

EDIT:

To clarify there's no simple proc. Your best bet is as indicated to save data to a text or machine readable file. 

Personally, I would purchase a one month subscription to Adobe and use Adobe Pro to convert it. If you have Adobe Professional, most big corps do, you can batch process all 500 in a script. Adobe has an Automator feature that works well IMO. 

Defense
Obsidian | Level 7

I have 500 PDF  files need to convert to SAS datasets. I found there are some complex codes show to do one convert (one pdf file to one  sas datsset) . This code is prety complex and one do one convert. Is there any simple proc. to do it in a simple way and can quickly convert  these 500 pdf files?

 

Thanks

 

 

 

AhmedAl_Attar
Rhodochrosite | Level 12

You may want to look into the option of converting the PDF into a File format that can be accessed by SAS, such as Excel!?

Here is a link with such option: https://wagda.lib.washington.edu/gishelp/tutorial/excel.html

 

Hope this helps,

Ahmed

Patrick
Opal | Level 21

To add to what @AhmedAl_Attar posted:

PDF files as such are not "tabular" so there is not really a direct conversion path. Tika would allow you to convert your PDF into a text based document (done that myself, works really well and is simple to use) which you then could read into SAS.

There is also Apache PDFBox which apparently can do PDF to csv conversions - never used it though.

 

https://tika.apache.org/ 

http://pdfbox.apache.org/ 

Kurt_Bremser
Super User

Such a conversion will only make sense if the PDFs in question contain usable data. Since a PDF could also be one big graphical image (like a scan), it is one of the least suited formats for business intelligence data transfer.

I'd rather request the originator to provide data in a format that makes sense. And provide metadata (column descriptions) along.

ballardw
Super User

If the issue has to do with PDF fillable forms and the data contained therein then use a proper PDF tool like Adobe Acrobat Pro to export the data. That will usually result in some form of set that can be imported to SAS.

Defense
Obsidian | Level 7
This is the way I decide to do with my data.

1. Convert PDF to excel using Adobe Acrobat Professional version, which allows me to convert hundreds pdf to excel just by one “click”

2. Read excel to sas using a macro

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 7507 views
  • 1 like
  • 6 in conversation