<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Convert PDF to TXT in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186040#M265639</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I was hoping for a SAS exacutable program that runs start to finish, with libname pointing to the .pdf's in question, executing a conversion, then (part 2), pulling the text items into a SAS dataset. Part 2 is managable.&amp;nbsp;&amp;nbsp; Because of the restricted invironment, no outside software is allowed.&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 15 Oct 2014 20:16:39 GMT</pubDate>
    <dc:creator>jakestat</dc:creator>
    <dc:date>2014-10-15T20:16:39Z</dc:date>
    <item>
      <title>Convert PDF to TXT</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186037#M265636</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Does anyone have experience using DOS command within SAS to convert .PDF files to .TXT files so that it can be read back into SAS?&amp;nbsp; I have heard that you have to put sas to "sleep" during the DOS command, then use an X statement.&amp;nbsp;&amp;nbsp; Thank you for any help!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 15 Oct 2014 19:56:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186037#M265636</guid>
      <dc:creator>jakestat</dc:creator>
      <dc:date>2014-10-15T19:56:38Z</dc:date>
    </item>
    <item>
      <title>Re: Convert PDF to TXT</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186038#M265637</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I don't know of any SAS resource to do this, which doesn't mean it doesn't exist, but here's a blog post I recently came across that covers some tools that do:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://alignedleft.com/resources/pdf-data-extraction-tools" title="http://alignedleft.com/resources/pdf-data-extraction-tools"&gt;Tools for Extracting Data From PDFs &amp;amp;mdash; Scott Murray &amp;amp;mdash; alignedleft&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 15 Oct 2014 20:01:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186038#M265637</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-10-15T20:01:24Z</dc:date>
    </item>
    <item>
      <title>Re: Convert PDF to TXT</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186039#M265638</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Depends upon what program you are calling in DOS and how they have to interact with SAS. I've had extremely good success with the products from: &lt;A href="http://www.a-pdf.com/form-data-extractor/" title="http://www.a-pdf.com/form-data-extractor/"&gt;Batch extract PDF Form Data. [A-PDF.com]&lt;/A&gt; and I've been able to put the calls in the process flow without having to forcing SAS to sleep.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 15 Oct 2014 20:11:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186039#M265638</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2014-10-15T20:11:50Z</dc:date>
    </item>
    <item>
      <title>Re: Convert PDF to TXT</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186040#M265639</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I was hoping for a SAS exacutable program that runs start to finish, with libname pointing to the .pdf's in question, executing a conversion, then (part 2), pulling the text items into a SAS dataset. Part 2 is managable.&amp;nbsp;&amp;nbsp; Because of the restricted invironment, no outside software is allowed.&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 15 Oct 2014 20:16:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186040#M265639</guid>
      <dc:creator>jakestat</dc:creator>
      <dc:date>2014-10-15T20:16:39Z</dc:date>
    </item>
    <item>
      <title>Re: Convert PDF to TXT</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186041#M265640</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you're in an enterprise environment you're more likely to have access to Adobe Professional though. What does your PDF look like?&lt;/P&gt;&lt;P&gt;Adobe has some scripting tools that allow you to batch process something things relatively painlessly. It helps if you know some javascript though. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 15 Oct 2014 20:24:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186041#M265640</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-10-15T20:24:38Z</dc:date>
    </item>
    <item>
      <title>Re: Convert PDF to TXT</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186042#M265641</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have code to pull the PDF from the following website.&amp;nbsp; It seems the PDF was created directly from excel.&amp;nbsp; &lt;/P&gt;&lt;P&gt; &lt;SPAN style="background: white; color: purple; font-family: 'Courier New'; font-size: 10pt;"&gt;&lt;A class="jive-link-external-small" href="http://www.stearnsdhialab.com/css/auctions/Sep1Hay.pdf"&gt;http://www.stearnsdhialab.com/css/auctions/Sep1Hay.pdf&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 15 Oct 2014 21:16:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186042#M265641</guid>
      <dc:creator>jakestat</dc:creator>
      <dc:date>2014-10-15T21:16:18Z</dc:date>
    </item>
    <item>
      <title>Re: Convert PDF to TXT</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186043#M265642</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If it was me:&lt;/P&gt;&lt;P&gt;1. Batch download all files&lt;/P&gt;&lt;P&gt;2. Use Adobe Professional to save as Excel file or XML, which it does nicely&lt;/P&gt;&lt;P&gt;3. Use SAS to extract information from Excel files.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 15 Oct 2014 21:47:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186043#M265642</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-10-15T21:47:03Z</dc:date>
    </item>
    <item>
      <title>Re: Convert PDF to TXT</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186044#M265643</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Jake: I took a closer look at the files you are trying to download and I doubt if any pdf converter would know how to correctly convert the second page of each of the pdfs on that site.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;i.e., one could easily write vb script (to run have SAS run) that (1) opened Adobe Reader; (2) did a select all (i.e., ctrl-A); (3) copied the text to your system's notepad (ctrl-C); (4) opened notepad; (5) pasted the clipbrd to notepad (ctrl-V); went back to Adobe and selected the next page (down arrow); repeated the copy/paste steps; (6) saved the notepad file; and (7) had sas open the txt file that was created and parsed its contents.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The first 75% of the file would be easy to parse as all of the desired text starts with the headers:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Auction Date: September 04, 2014&lt;/P&gt;&lt;P&gt;LOT NO. SAMPLE DESCRIPTION MOISTURE PROTEIN RFV CUTTING LOAD SIZE PRICE&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and the data that follows the headers is rather straight forward:&lt;/P&gt;&lt;P&gt;869 Large Round 14.96 20.48 82.78 1 15.48 75.00&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, the last approximately 25% of the file didn't make sense to me given the header variables:&lt;/P&gt;&lt;P&gt;872 Medium Square STRAW 78 Bales $ 2 5.00&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If those latter lines are all irrelevant, then the problem would be easy to solve.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 Oct 2014 13:23:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186044#M265643</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2014-10-17T13:23:07Z</dc:date>
    </item>
    <item>
      <title>Re: Convert PDF to TXT</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186045#M265644</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I've only have experience on &lt;A href="http://www.rasteredge.com/how-to/csharp-imaging/pdf-text-extract/"&gt;&lt;SPAN style="color: #383838;"&gt;extracting text from PDF&lt;/SPAN&gt;&lt;/A&gt; or &lt;A href="http://www.rasteredge.com/how-to/csharp-imaging/pdf-convert-word/"&gt;&lt;SPAN style="color: #383838;"&gt;converting PDF to Word&lt;/SPAN&gt;&lt;/A&gt; for getting text, But I've no idea on converting PDF to TXT directly.&lt;/P&gt;&lt;P&gt;I'm also looking forward to learn a solution for it.&lt;/P&gt;&lt;P&gt;Any other ideas?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 31 Jul 2015 07:48:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Convert-PDF-to-TXT/m-p/186045#M265644</guid>
      <dc:creator>longwest</dc:creator>
      <dc:date>2015-07-31T07:48:21Z</dc:date>
    </item>
  </channel>
</rss>

