Help using Base SAS procedures

proc import pdf

Reply
Contributor
Posts: 41

proc import pdf

Hi all,

 

It's difficult to google this: the results are all pdf file, treating about proc import with excel or csv...

 

I wouldl ike to impoirt pdf file into a sas dataset.

Do you know how to do it?

 

Thanks for answers

Super User
Posts: 10,474

Re: proc import pdf

PDF is not intended as a data interchange file format so I do not believe there is any direct interface for Proc Import.

The mixture of text, images and formatting would make it worse than Excel.

If you have a tool such as Adobe Pro that will you let you extract bits and save them to other file formats that may be your best bet. Or try to find another PDF to Text conversion.

Super User
Super User
Posts: 7,392

Re: proc import pdf

Save yourself days of work.  Go back to the source data.  PDF is a report - i.e. not used as a data medium.  If you have to use it then likihood is it will be quicker to type it in by hand.

Contributor
Posts: 41

Re: proc import pdf

[ Edited ]

If it was that easy I would do it. :-)


But the provider send the file in pdf, and I can't ask him to modify that...

And it should be imported in a data set automatically so...I can't manually modify it in txt.

 

No solutions? Smiley Sad

Even difficult one?

Trusted Advisor
Posts: 2,113

Re: proc import pdf

you could write a script to export the data.  See

 

https://www.pdfscripting.com/public/Automating-Acrobat.cfm

 

for some ideas.

Super User
Super User
Posts: 7,392

Re: proc import pdf

"But the provider send the file in pdf" - are you paying for this?  If so you should be telling them what to send (i.e. data import specifications) and if they can't withdraw the contract or if you still have to use it, then increase your budget/resource needs.  Its funny how thinkgs suddenly can be done when you mention costs.

 

The easy option, copy paste each part.  The hard, PDF scripting.  You may be able to get something from PDFtk:

https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/

Or by exporting from Adobe, however its still going to be a chunk of work.

Trusted Advisor
Posts: 2,113

Re: proc import pdf

I'm guessing that your .pdf file is some sort of report.  In that case it has other information than the data.

 

I usually just bring the .pdf file up on my screen and copy the data and then paste it into a text file for processing.  That works for one-offs that aren't too long.

 

If you have Adobe Acrobat, you can also export the .pdf content to a spreadsheet or text file.

PROC Star
Posts: 7,357

Re: proc import pdf

[ Edited ]

Roger DeAngelis posted the following solution on SAS-L. I'm not sufficiently familiar with SAS/IML's interface to R, so someone else will probably have to revise the code in the case I misunderstood the documentation.

 

The solution requires installing two R packages (tm and slam) and downloading some open source programs (xpdf) from: https://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=0ahUKEwjw-MeyvcTRA...

 

An example:

 

* create a pdf;
title;footnote;
ods pdf file="c:/art/class.pdf";
proc print data=sashelp.class noobs;
run;
ods pdf close;

 

run some code in R (this is the part where someone would have to show us how to do it using IML's interface to R):

 

first the following line has to be run in R: getwd()

That will provide the address where the xpdf executables have to be stored

 

library("tm");
library("slam");
file <- "c:/art/class.pdf";
Rpdf <- readPDF(control = list(text = "-layout"));
corpus <- VCorpus(URISource(file),
readerControl = list(reader = Rpdf));
array <- as.data.frame(content(content(corpus)[[1]]));
colnames(array)<-"lines";

 

Then the IML/R code would have to be inserted to make the file array available to sas as work.array

 

Finally, in this example, I used the following data step to create the desired SAS data set:

 

data class;
  set array (firstobs=5);
  if mod(_n_,2);
  name=scan(lines,1,' ');
  sex=scan(lines,2,' ');
  age=input(scan(lines,3,' '),8.);
  height=input(scan(lines,4,' '),8.);
  weight=input(scan(lines,5,' '),8.);
run;

 

I can't test the following as I've never used IML, but my best guess is that the actual code (above) would be:

 

* create a pdf;
title;footnote;
ods pdf file="c:/art/class.pdf";
proc print data=sashelp.class noobs;
run;
ods pdf close;

proc iml;
submit / R;
library("tm");
library("slam");
file <- "c:/art/class.pdf";
Rpdf <- readPDF(control = list(text = "-layout"));
corpus <- VCorpus(URISource(file),
readerControl = list(reader = Rpdf));
array <- as.data.frame(content(content(corpus)[[1]]));
colnames(array)<-"lines";
endsubmit;
call ImportDataSetFromR("work.array","array");
quit;

data class;
  set array (firstobs=5);
  if mod(_n_,2);
  name=scan(lines,1,' ');
  sex=scan(lines,2,' ');
  age=input(scan(lines,3,' '),8.);
  height=input(scan(lines,4,' '),8.);
  weight=input(scan(lines,5,' '),8.);
run;

 

HTH,

Art, CEO, AnalystFinder.com

 

Ask a Question
Discussion stats
  • 7 replies
  • 502 views
  • 0 likes
  • 5 in conversation