BookmarkSubscribeRSS Feed
acordes
Rhodochrosite | Level 12

I follow on this blog post https://blogs.sas.com/content/sgf/2023/11/08/extract-text-from-a-pdf-file-using-sas-viya/ 

but for me it's not working. @pstyliadis 

 

pic.png

 

Running this program

%let path = /Projects/Extract text from PDFs and create tables/mypdfs;
 
%put &=path;


proc cas;
   file log;
   table.dropCaslib /
   caslib='ac_pdf' quiet = true;
 run;

proc cas;
   session mySession;

   table.addCaslib /
     caslib="ac_pdf"
     description="pdf files"
     dataSource={srctype="path"}
     path="&path" subdirs=true ;
run;


proc casutil;
	list files incaslib='ac_pdf'; 
quit;

proc casutil;
    load casdata=''                                                              /* To read in all files use an empty string. For a single file specify the file name */
         incaslib='ac_pdf'                                                       /* The location of the files to load */
         importoptions=(fileType="document" fileExtList = 'PDF' tikaConv=True)   /* Specify document import options   */
         casout='pdf_data' outcaslib='casuser' replace;                          /* Specify the output cas table info */
quit;

I get the following errors in the log:

 

1 %studio_hide_wrapper;
83 %let path = /Projects/Extract text from PDFs and create tables/mypdfs;
84
85 %put &=path;
PATH=/Projects/Extract text from PDFs and create tables/mypdfs
86
87
88 proc cas;
89 file log;
90 table.dropCaslib /
91 caslib='ac_pdf' quiet = true;
92 run;
NOTE: Active Session now MYSESSION.
NOTE: 'CASUSER(DKXEVO0)' is now the active caslib.
NOTE: Cloud Analytic Services removed the caslib 'ac_pdf'.
93
NOTE: PROCEDURE CAS used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
 
94 proc cas;
95 session mySession;
96
97 table.addCaslib /
98 caslib="ac_pdf"
99 description="pdf files"
100 dataSource={srctype="path"}
101 path="&path" subdirs=true ;
102 run;
NOTE: Active Session now mySession.
NOTE: Failed to resolve path /Projects/Extract text from PDFs and create tables/mypdfs/ for caslib ac_pdf.
NOTE: 'ac_pdf' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'ac_pdf'.
103
104
NOTE: The PROCEDURE CAS printed page 5.
NOTE: PROCEDURE CAS used (Total process time):
real time 0.02 seconds
cpu time 0.05 seconds
 
105 proc casutil;
NOTE: The UUID '668012bb-0288-034f-9927-9c76fa3a3263' is connected using session MYSESSION.
106
106! list files incaslib='ac_pdf';
Caslib Information
Library ac_pdf
Source Type PATH
Description pdf files
Path /Projects/Extract text from PDFs and create tables/mypdfs/
Session local Yes
Active Yes
Personal No
Hidden No
Transient No
ERROR: The file or path '/Projects/Extract text from PDFs and create tables/mypdfs' is not available in the file system.
ERROR: The action stopped due to errors.
NOTE: Cloud Analytic Services processed the combined requests in 0.001124 seconds.
107 quit;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
 
108
109 proc casutil;
NOTE: The UUID '668012bb-0288-034f-9927-9c76fa3a3263' is connected using session MYSESSION.
110 load casdata='' /* To read in all files use an empty string.
110! For a single file specify the file name */
111 incaslib='ac_pdf' /* The location of the files to load */
112 importoptions=(fileType="document" fileExtList = 'PDF' tikaConv=True) /* Specify document import options */
113 casout='pdf_data' outcaslib='casuser' replace; /* Specify the output cas table info */
ERROR: When loading a document table the path value must be a directory. You can specify path="" to load documents from the root
directory of the caslib. You can specify common file name extensions in the fileExtList parameter to restrict the documents
to load.
ERROR: The action stopped due to errors.
NOTE: The Cloud Analytic Services server processed the request in 0.000582 seconds.
114 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
 
115
116 %studio_hide_wrapper;
127
128

 

1 REPLY 1
acordes
Rhodochrosite | Level 12

When I upload my pdfs to a caslib via sftp then it works with the following code. 

Except that it throws out the following problem note, but I think it should work once it gets solved by the admin. 

Problem Note 69063: "ERROR: Failed to initialize a Java virtual machine in TKJNL" occurs with SAS® Cloud Analytic Services (CAS) actions that use Java

 


proc cas ;
session mySession;
   table.dropCaslib / caslib='_TMPCAS_' quiet=true; 
   table.dropCaslib / caslib='_LOADTMP' quiet=true; 
run;

/*** Macro variable setup ***/
/* Specify file path to your images (such as the giraffe_dolphin_small example data) */
%let imagePath = /caslibs/akaike/my_pdf/;

/* Specify the caslib and table name for your image data table */
%let imageCaslibName = casuser;
%let imageTableName = images;

/* Specify the caslib and table name for the augmented training image data table */
%let imageTrainingCaslibName = &imageCaslibName;
%let imageTrainingTableName = &imageTableName.Augmented;


proc cas;
   file log;
   table.dropCaslib /
   caslib='loadPDFTempCaslib' quiet = true;
 run;


/*** Load and display images ***/ 
/* Create temporary caslib and libref for loading images */ 
caslib loadPDFTempCaslib datasource=(srctype="path") path="&imagePath"
    subdirs notactive sessref=mySession;
 
libname _loadtmp cas caslib="loadPDFTempCaslib"; 
libname _tmpcas_ cas caslib="CASUSER"; 

proc casutil;
	list files incaslib='loadPDFTempCaslib'; 
quit;


proc casutil;
    load casdata=''                                                              /* To read in all files use an empty string. For a single file specify the file name */
         incaslib='loadPDFTempCaslib'                                            /* The location of the files to load */
         importoptions=(fileType="document" fileExtList = 'PDF' tikaConv=True)   /* Specify document import options   */
         casout='pdf_data' outcaslib='casuser' replace;                          /* Specify the output cas table info */
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 879 views
  • 0 likes
  • 1 in conversation