BookmarkSubscribeRSS Feed
fabdu92
Obsidian | Level 7

Hi all,

 

It's difficult to google this: the results are all pdf file, treating about proc import with excel or csv...

 

I wouldl ike to impoirt pdf file into a sas dataset.

Do you know how to do it?

 

Thanks for answers

7 REPLIES 7
ballardw
Super User

PDF is not intended as a data interchange file format so I do not believe there is any direct interface for Proc Import.

The mixture of text, images and formatting would make it worse than Excel.

If you have a tool such as Adobe Pro that will you let you extract bits and save them to other file formats that may be your best bet. Or try to find another PDF to Text conversion.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Save yourself days of work.  Go back to the source data.  PDF is a report - i.e. not used as a data medium.  If you have to use it then likihood is it will be quicker to type it in by hand.

fabdu92
Obsidian | Level 7

If it was that easy I would do it. 🙂


But the provider send the file in pdf, and I can't ask him to modify that...

And it should be imported in a data set automatically so...I can't manually modify it in txt.

 

No solutions? 😞

Even difficult one?

Doc_Duke
Rhodochrosite | Level 12

you could write a script to export the data.  See

 

https://www.pdfscripting.com/public/Automating-Acrobat.cfm

 

for some ideas.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

"But the provider send the file in pdf" - are you paying for this?  If so you should be telling them what to send (i.e. data import specifications) and if they can't withdraw the contract or if you still have to use it, then increase your budget/resource needs.  Its funny how thinkgs suddenly can be done when you mention costs.

 

The easy option, copy paste each part.  The hard, PDF scripting.  You may be able to get something from PDFtk:

https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/

Or by exporting from Adobe, however its still going to be a chunk of work.

Doc_Duke
Rhodochrosite | Level 12

I'm guessing that your .pdf file is some sort of report.  In that case it has other information than the data.

 

I usually just bring the .pdf file up on my screen and copy the data and then paste it into a text file for processing.  That works for one-offs that aren't too long.

 

If you have Adobe Acrobat, you can also export the .pdf content to a spreadsheet or text file.

art297
Opal | Level 21

Roger DeAngelis posted the following solution on SAS-L. I'm not sufficiently familiar with SAS/IML's interface to R, so someone else will probably have to revise the code in the case I misunderstood the documentation.

 

The solution requires installing two R packages (tm and slam) and downloading some open source programs (xpdf) from: https://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=0ahUKEwjw-MeyvcTRA...

 

An example:

 

* create a pdf;
title;footnote;
ods pdf file="c:/art/class.pdf";
proc print data=sashelp.class noobs;
run;
ods pdf close;

 

run some code in R (this is the part where someone would have to show us how to do it using IML's interface to R):

 

first the following line has to be run in R: getwd()

That will provide the address where the xpdf executables have to be stored

 

library("tm");
library("slam");
file <- "c:/art/class.pdf";
Rpdf <- readPDF(control = list(text = "-layout"));
corpus <- VCorpus(URISource(file),
readerControl = list(reader = Rpdf));
array <- as.data.frame(content(content(corpus)[[1]]));
colnames(array)<-"lines";

 

Then the IML/R code would have to be inserted to make the file array available to sas as work.array

 

Finally, in this example, I used the following data step to create the desired SAS data set:

 

data class;
  set array (firstobs=5);
  if mod(_n_,2);
  name=scan(lines,1,' ');
  sex=scan(lines,2,' ');
  age=input(scan(lines,3,' '),8.);
  height=input(scan(lines,4,' '),8.);
  weight=input(scan(lines,5,' '),8.);
run;

 

I can't test the following as I've never used IML, but my best guess is that the actual code (above) would be:

 

* create a pdf;
title;footnote;
ods pdf file="c:/art/class.pdf";
proc print data=sashelp.class noobs;
run;
ods pdf close;

proc iml;
submit / R;
library("tm");
library("slam");
file <- "c:/art/class.pdf";
Rpdf <- readPDF(control = list(text = "-layout"));
corpus <- VCorpus(URISource(file),
readerControl = list(reader = Rpdf));
array <- as.data.frame(content(content(corpus)[[1]]));
colnames(array)<-"lines";
endsubmit;
call ImportDataSetFromR("work.array","array");
quit;

data class;
  set array (firstobs=5);
  if mod(_n_,2);
  name=scan(lines,1,' ');
  sex=scan(lines,2,' ');
  age=input(scan(lines,3,' '),8.);
  height=input(scan(lines,4,' '),8.);
  weight=input(scan(lines,5,' '),8.);
run;

 

HTH,

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 13094 views
  • 0 likes
  • 5 in conversation