BookmarkSubscribeRSS Feed
acordes
Rhodochrosite | Level 12

I follow on this blog post https://blogs.sas.com/content/sgf/2023/11/08/extract-text-from-a-pdf-file-using-sas-viya/ 

but for me it's not working. @pstyliadis 

 

pic.png

 

Running this program

%let path = /Projects/Extract text from PDFs and create tables/mypdfs;
 
%put &=path;


proc cas;
   file log;
   table.dropCaslib /
   caslib='ac_pdf' quiet = true;
 run;

proc cas;
   session mySession;

   table.addCaslib /
     caslib="ac_pdf"
     description="pdf files"
     dataSource={srctype="path"}
     path="&path" subdirs=true ;
run;


proc casutil;
	list files incaslib='ac_pdf'; 
quit;

proc casutil;
    load casdata=''                                                              /* To read in all files use an empty string. For a single file specify the file name */
         incaslib='ac_pdf'                                                       /* The location of the files to load */
         importoptions=(fileType="document" fileExtList = 'PDF' tikaConv=True)   /* Specify document import options   */
         casout='pdf_data' outcaslib='casuser' replace;                          /* Specify the output cas table info */
quit;

I get the following errors in the log:

 

1 %studio_hide_wrapper;
83 %let path = /Projects/Extract text from PDFs and create tables/mypdfs;
84
85 %put &=path;
PATH=/Projects/Extract text from PDFs and create tables/mypdfs
86
87
88 proc cas;
89 file log;
90 table.dropCaslib /
91 caslib='ac_pdf' quiet = true;
92 run;
NOTE: Active Session now MYSESSION.
NOTE: 'CASUSER(DKXEVO0)' is now the active caslib.
NOTE: Cloud Analytic Services removed the caslib 'ac_pdf'.
93
NOTE: PROCEDURE CAS used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
 
94 proc cas;
95 session mySession;
96
97 table.addCaslib /
98 caslib="ac_pdf"
99 description="pdf files"
100 dataSource={srctype="path"}
101 path="&path" subdirs=true ;
102 run;
NOTE: Active Session now mySession.
NOTE: Failed to resolve path /Projects/Extract text from PDFs and create tables/mypdfs/ for caslib ac_pdf.
NOTE: 'ac_pdf' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'ac_pdf'.
103
104
NOTE: The PROCEDURE CAS printed page 5.
NOTE: PROCEDURE CAS used (Total process time):
real time 0.02 seconds
cpu time 0.05 seconds
 
105 proc casutil;
NOTE: The UUID '668012bb-0288-034f-9927-9c76fa3a3263' is connected using session MYSESSION.
106
106! list files incaslib='ac_pdf';
Caslib Information
Library ac_pdf
Source Type PATH
Description pdf files
Path /Projects/Extract text from PDFs and create tables/mypdfs/
Session local Yes
Active Yes
Personal No
Hidden No
Transient No
ERROR: The file or path '/Projects/Extract text from PDFs and create tables/mypdfs' is not available in the file system.
ERROR: The action stopped due to errors.
NOTE: Cloud Analytic Services processed the combined requests in 0.001124 seconds.
107 quit;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
 
108
109 proc casutil;
NOTE: The UUID '668012bb-0288-034f-9927-9c76fa3a3263' is connected using session MYSESSION.
110 load casdata='' /* To read in all files use an empty string.
110! For a single file specify the file name */
111 incaslib='ac_pdf' /* The location of the files to load */
112 importoptions=(fileType="document" fileExtList = 'PDF' tikaConv=True) /* Specify document import options */
113 casout='pdf_data' outcaslib='casuser' replace; /* Specify the output cas table info */
ERROR: When loading a document table the path value must be a directory. You can specify path="" to load documents from the root
directory of the caslib. You can specify common file name extensions in the fileExtList parameter to restrict the documents
to load.
ERROR: The action stopped due to errors.
NOTE: The Cloud Analytic Services server processed the request in 0.000582 seconds.
114 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
 
115
116 %studio_hide_wrapper;
127
128

 

1 REPLY 1
acordes
Rhodochrosite | Level 12

When I upload my pdfs to a caslib via sftp then it works with the following code. 

Except that it throws out the following problem note, but I think it should work once it gets solved by the admin. 

Problem Note 69063: "ERROR: Failed to initialize a Java virtual machine in TKJNL" occurs with SAS® Cloud Analytic Services (CAS) actions that use Java

 


proc cas ;
session mySession;
   table.dropCaslib / caslib='_TMPCAS_' quiet=true; 
   table.dropCaslib / caslib='_LOADTMP' quiet=true; 
run;

/*** Macro variable setup ***/
/* Specify file path to your images (such as the giraffe_dolphin_small example data) */
%let imagePath = /caslibs/akaike/my_pdf/;

/* Specify the caslib and table name for your image data table */
%let imageCaslibName = casuser;
%let imageTableName = images;

/* Specify the caslib and table name for the augmented training image data table */
%let imageTrainingCaslibName = &imageCaslibName;
%let imageTrainingTableName = &imageTableName.Augmented;


proc cas;
   file log;
   table.dropCaslib /
   caslib='loadPDFTempCaslib' quiet = true;
 run;


/*** Load and display images ***/ 
/* Create temporary caslib and libref for loading images */ 
caslib loadPDFTempCaslib datasource=(srctype="path") path="&imagePath"
    subdirs notactive sessref=mySession;
 
libname _loadtmp cas caslib="loadPDFTempCaslib"; 
libname _tmpcas_ cas caslib="CASUSER"; 

proc casutil;
	list files incaslib='loadPDFTempCaslib'; 
quit;


proc casutil;
    load casdata=''                                                              /* To read in all files use an empty string. For a single file specify the file name */
         incaslib='loadPDFTempCaslib'                                            /* The location of the files to load */
         importoptions=(fileType="document" fileExtList = 'PDF' tikaConv=True)   /* Specify document import options   */
         casout='pdf_data' outcaslib='casuser' replace;                          /* Specify the output cas table info */
quit;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 224 views
  • 0 likes
  • 1 in conversation