BookmarkSubscribeRSS Feed
Emma2021
Quartz | Level 8
I have a windows folder that has many sub folders and within those sub folders many pdf files. I would like to search a key word within all those pdf files contents. Can I use sas ? Except Windows built-in search.
7 REPLIES 7
ballardw
Super User

To use the SAS text search tools in basic SAS you would first need to convert every file to a SAS data set prior to searching and PDF is not a "nice" data format, likely meaning not practical.

 

I am not sure whether the SAS Text Miner, if you have access to that, might have more luck.

 

A secondary issue with PDF files is that the characters are not exactly stored as the letters you expect to make up the key word you may want to search for. 

 

 

SASKiwi
PROC Star

If you have Adobe Acrobat, it provides this capability: https://helpx.adobe.com/nz/acrobat/using/searching-pdfs.html

A Google search will find other tools and techniques.

SAS really isn't a good option for searching documents for key words unless you are doing analytics.

Emma2021
Quartz | Level 8
I don’t want to open each pdf to search the word. I thought SAS has a tool
AhmedAl_Attar
Rhodochrosite | Level 12

Hi @Emma2021 ,

You may want to invest in a Search Indexing Tool for your Windows Documents. These tools are purpose built, and they probably do their job much better than custom SAS hand coding.

Here is a example of such tool: https://docfetcher.sourceforge.io/en/index.html (Disclaimer: I've never used this before, but long time ago, I've used a similar indexing tool for searching through my saved docs) 

https://en.wikipedia.org/wiki/Desktop_search

https://sourceforge.net/software/desktop-search/

 

Hope this helps,

Ahmed

Patrick
Opal | Level 21

@Emma2021 wrote:
I don’t want to open each pdf to search the word. I thought SAS has a tool

To search for words in a .pdf you need to open it and scan the text whether you code that now explicitly yourself or some "tool" does it for you in the background.

 

SAS Text Miner/Text Analytics allows for .pdf sources. If you've got that licensed then look into it. 

If not then using SAS you would need to create a directory listing for all .pdf (path and name), call a 3rd party tool like tika to convert the .pdf to text and then use SAS to search through the text for specific terms.

SASKiwi
PROC Star

You don't have to. That link I posted describes how you can catalog multiple PDFs to search all of them at the same time.

FreelanceReinh
Jade | Level 19

Hi @Emma2021,

 

@SASKiwi wrote:

If you have Adobe Acrobat, it provides this capability: https://helpx.adobe.com/nz/acrobat/using/searching-pdfs.html


Note that this capability is also available in the free Acrobat Reader, i.e., you don't need the paid Acrobat Pro software. Just open the "Advanced Search" dialog with Ctrl-Shift-F and you'll see the option to select a folder. The search will automatically include all subfolders of the selected folder.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 695 views
  • 1 like
  • 6 in conversation