Hi all,
It's difficult to google this: the results are all pdf file, treating about proc import with excel or csv...
I wouldl ike to impoirt pdf file into a sas dataset.
Do you know how to do it?
Thanks for answers
PDF is not intended as a data interchange file format so I do not believe there is any direct interface for Proc Import.
The mixture of text, images and formatting would make it worse than Excel.
If you have a tool such as Adobe Pro that will you let you extract bits and save them to other file formats that may be your best bet. Or try to find another PDF to Text conversion.
Save yourself days of work. Go back to the source data. PDF is a report - i.e. not used as a data medium. If you have to use it then likihood is it will be quicker to type it in by hand.
If it was that easy I would do it. 🙂
But the provider send the file in pdf, and I can't ask him to modify that...
And it should be imported in a data set automatically so...I can't manually modify it in txt.
No solutions? 😞
Even difficult one?
you could write a script to export the data. See
https://www.pdfscripting.com/public/Automating-Acrobat.cfm
for some ideas.
"But the provider send the file in pdf" - are you paying for this? If so you should be telling them what to send (i.e. data import specifications) and if they can't withdraw the contract or if you still have to use it, then increase your budget/resource needs. Its funny how thinkgs suddenly can be done when you mention costs.
The easy option, copy paste each part. The hard, PDF scripting. You may be able to get something from PDFtk:
https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
Or by exporting from Adobe, however its still going to be a chunk of work.
I'm guessing that your .pdf file is some sort of report. In that case it has other information than the data.
I usually just bring the .pdf file up on my screen and copy the data and then paste it into a text file for processing. That works for one-offs that aren't too long.
If you have Adobe Acrobat, you can also export the .pdf content to a spreadsheet or text file.
Roger DeAngelis posted the following solution on SAS-L. I'm not sufficiently familiar with SAS/IML's interface to R, so someone else will probably have to revise the code in the case I misunderstood the documentation.
The solution requires installing two R packages (tm and slam) and downloading some open source programs (xpdf) from: https://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=0ahUKEwjw-MeyvcTRA...
An example:
* create a pdf;
title;footnote;
ods pdf file="c:/art/class.pdf";
proc print data=sashelp.class noobs;
run;
ods pdf close;
run some code in R (this is the part where someone would have to show us how to do it using IML's interface to R):
first the following line has to be run in R: getwd()
That will provide the address where the xpdf executables have to be stored
library("tm");
library("slam");
file <- "c:/art/class.pdf";
Rpdf <- readPDF(control = list(text = "-layout"));
corpus <- VCorpus(URISource(file),
readerControl = list(reader = Rpdf));
array <- as.data.frame(content(content(corpus)[[1]]));
colnames(array)<-"lines";
Then the IML/R code would have to be inserted to make the file array available to sas as work.array
Finally, in this example, I used the following data step to create the desired SAS data set:
data class;
set array (firstobs=5);
if mod(_n_,2);
name=scan(lines,1,' ');
sex=scan(lines,2,' ');
age=input(scan(lines,3,' '),8.);
height=input(scan(lines,4,' '),8.);
weight=input(scan(lines,5,' '),8.);
run;
I can't test the following as I've never used IML, but my best guess is that the actual code (above) would be:
* create a pdf;
title;footnote;
ods pdf file="c:/art/class.pdf";
proc print data=sashelp.class noobs;
run;
ods pdf close;
proc iml;
submit / R;
library("tm");
library("slam");
file <- "c:/art/class.pdf";
Rpdf <- readPDF(control = list(text = "-layout"));
corpus <- VCorpus(URISource(file),
readerControl = list(reader = Rpdf));
array <- as.data.frame(content(content(corpus)[[1]]));
colnames(array)<-"lines";
endsubmit;
call ImportDataSetFromR("work.array","array");
quit;
data class;
set array (firstobs=5);
if mod(_n_,2);
name=scan(lines,1,' ');
sex=scan(lines,2,' ');
age=input(scan(lines,3,' '),8.);
height=input(scan(lines,4,' '),8.);
weight=input(scan(lines,5,' '),8.);
run;
HTH,
Art, CEO, AnalystFinder.com
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.