I follow on this blog post https://blogs.sas.com/content/sgf/2023/11/08/extract-text-from-a-pdf-file-using-sas-viya/
but for me it's not working. @pstyliadis
Running this program
%let path = /Projects/Extract text from PDFs and create tables/mypdfs;
%put &=path;
proc cas;
file log;
table.dropCaslib /
caslib='ac_pdf' quiet = true;
run;
proc cas;
session mySession;
table.addCaslib /
caslib="ac_pdf"
description="pdf files"
dataSource={srctype="path"}
path="&path" subdirs=true ;
run;
proc casutil;
list files incaslib='ac_pdf';
quit;
proc casutil;
load casdata='' /* To read in all files use an empty string. For a single file specify the file name */
incaslib='ac_pdf' /* The location of the files to load */
importoptions=(fileType="document" fileExtList = 'PDF' tikaConv=True) /* Specify document import options */
casout='pdf_data' outcaslib='casuser' replace; /* Specify the output cas table info */
quit;
I get the following errors in the log:
When I upload my pdfs to a caslib via sftp then it works with the following code.
Except that it throws out the following problem note, but I think it should work once it gets solved by the admin.
proc cas ;
session mySession;
table.dropCaslib / caslib='_TMPCAS_' quiet=true;
table.dropCaslib / caslib='_LOADTMP' quiet=true;
run;
/*** Macro variable setup ***/
/* Specify file path to your images (such as the giraffe_dolphin_small example data) */
%let imagePath = /caslibs/akaike/my_pdf/;
/* Specify the caslib and table name for your image data table */
%let imageCaslibName = casuser;
%let imageTableName = images;
/* Specify the caslib and table name for the augmented training image data table */
%let imageTrainingCaslibName = &imageCaslibName;
%let imageTrainingTableName = &imageTableName.Augmented;
proc cas;
file log;
table.dropCaslib /
caslib='loadPDFTempCaslib' quiet = true;
run;
/*** Load and display images ***/
/* Create temporary caslib and libref for loading images */
caslib loadPDFTempCaslib datasource=(srctype="path") path="&imagePath"
subdirs notactive sessref=mySession;
libname _loadtmp cas caslib="loadPDFTempCaslib";
libname _tmpcas_ cas caslib="CASUSER";
proc casutil;
list files incaslib='loadPDFTempCaslib';
quit;
proc casutil;
load casdata='' /* To read in all files use an empty string. For a single file specify the file name */
incaslib='loadPDFTempCaslib' /* The location of the files to load */
importoptions=(fileType="document" fileExtList = 'PDF' tikaConv=True) /* Specify document import options */
casout='pdf_data' outcaslib='casuser' replace; /* Specify the output cas table info */
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.