<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Infile input - preserve linefeeds in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470809#M120499</link>
    <description>&lt;P&gt;Adobe Pro works with JavaScript and has macro type functionality as well. I'm not suggesting point and click here either.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/215655"&gt;@stray_tachyon&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;We have to process thousands of files.&amp;nbsp; Using Acrobat Pro is not feasible&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 16 Jun 2018 21:52:27 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2018-06-16T21:52:27Z</dc:date>
    <item>
      <title>Infile input - preserve linefeeds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470447#M120400</link>
      <description>&lt;P&gt;Hi all. My first post so please go easy on me &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; Our team is using SAS Contextual Analysis to do pull matching text (i.e. sick leave, wages, etc) from a bunch of collective agreements (samples: &lt;A href="https://www.sdc.gov.on.ca/sites/mol/drs/ca/" target="_blank"&gt;https://www.sdc.gov.on.ca/sites/mol/drs/ca/&lt;/A&gt;).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Our process takes two steps, first step is to create a bunch of concept rules in the Contextual Analysis and process the text files to generate a number of CA datasets. The second step is to run SAS codes in the Enterprise Guide to extract a blob of text surrounding the matched terms. I'm currently trying to extract wage tables from the collective agreements.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's what I would like to extract from the original text file (converted from PDF): (Forum software messed up the format.&amp;nbsp; Please see the attached&amp;nbsp;611-12921-14 (805-0145).pdf.txt file)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SALARY GRID FOR FULL-TIME INSTRUCTORS May 1, 2010 STEPS Base 1 2 3 4 5 6 7 8 9 10 12-month contract $44,908 $46,744 $48,580 $50,416 $52,252 $54,088 $55,924 $57,760 $59,596 $61,432 $63,268 10-month contract $37,423 $38,953 $40,483 $42,013 $43,543 $45,073 $46,603 $48,133 $49,663 $51,193 $52,723 May 1, 2011 Base 1 2 3 4 5 6 7 8 9 10 12-month contract $45,357 $47,211 $49,065 $50,919 $52,773 $54,627 $56,481 $58,335 $60,189 $62,043 $63,897 10-month contract $37,798 $39,343 $40,888 $42,433 $43,978 $45,523 $47,068 $48,613 $50,158 $51,703 $53,248 May 1, 2012 Base 1 2 3 4 5 6 7 8 9 10 12-month contract $46,264 $48,155 $50,046 $51,937 $53,828 $55,719 $57,610 $59,501 $61,392 $63,283 $65,174 10-month contract $38,553 $40,129 $41,705 $43,281 $44,857 $46,433 $48,008 $49,584 $51,160 $52,736 $54,312 May 1, 2013 Base 1 2 3 4 5 6 7 8 9 10 12-month contract $47,189 $49,118 $51,047 $52,976 $54,905 $56,834 $58,763 $60,692 $62,621 $64,550 $66,479 10-month contract $39,324 $40,932 $42,539 $44,147 $45,754 $47,362 $48,969 $50,577 $52,184 $53,792 $55,399 Salary scale excludes 4% vacation pay.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's the relevant code:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;%do i = 1 %to &amp;amp;counter;&lt;/P&gt;&lt;P&gt;/*%put Filename &amp;amp;&amp;amp;filename&amp;amp;i;*/&lt;BR /&gt;%let original_length = &amp;amp;&amp;amp;originallength&amp;amp;i;&lt;/P&gt;&lt;P&gt;data snippet_&amp;amp;concept; /* opens the txt file and reads in starting at the offset position*/&lt;BR /&gt;infile "&amp;amp;&amp;amp;fr&amp;amp;i." lrecl=1000000 recfm=f truncover;&lt;BR /&gt;length additional_provision_text $1000;&lt;BR /&gt;input @&amp;amp;&amp;amp;offset&amp;amp;i additional_provision_text $&amp;amp;totchnk.. @;&lt;BR /&gt;length provision_text $1000;&lt;BR /&gt;input @&amp;amp;&amp;amp;originalstartoffset&amp;amp;i provision_text $&amp;amp;original_length..;&lt;BR /&gt;length quantifiable_value $10;&lt;BR /&gt;quantifiable_value = "&amp;amp;&amp;amp;quantifiable&amp;amp;i";&lt;BR /&gt;length document_filename $256;&lt;BR /&gt;document_filename = "&amp;amp;&amp;amp;filename&amp;amp;i";&lt;BR /&gt;start_offset = &amp;amp;&amp;amp;originalstartoffset&amp;amp;i;&lt;BR /&gt;end_offset = &amp;amp;&amp;amp;originalendoffset&amp;amp;i;&lt;BR /&gt;length = &amp;amp;&amp;amp;original_length;&lt;BR /&gt;document_id = &amp;amp;&amp;amp;docid&amp;amp;i;&lt;BR /&gt;ROW_ID= &amp;amp;&amp;amp;ROWID&amp;amp;i;&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;proc append base = &amp;amp;concept /*appends each record to a data set*/&lt;BR /&gt;data = snippet_&amp;amp;concept force;&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;%end;&lt;BR /&gt;%mend do_snippet;&lt;BR /&gt;%do_snippet;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have attached the exported dataset to this post. As you can see, all the linefeeds are removed in the "additional_provision_text" column From the "611-12921-14 (805-0145).pdf.txt" file&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SALARY GRID FOR FULL-TIME INSTRUCTORS May 1, 2010 STEPS Base 1 2 3 4 5 6 7 8 9 10 12-month contract $44,908 $46,744 $48,580 $50,416 $52,252 $54,088 $55,924 $57,760 $59,596 $61,432 $63,268 10-month contract $37,423 $38,953 $40,483 $42,013 $43,543 $45,073 $46,603 $48,133 $49,663 $51,193 $52,723 May 1, 2011 Base 1 2 3 4 5 6 7 8 9 10 12-month contract $45,357 $47,211 $49,065 $50,919 $52,773 $54,627 $56,481 $58,335 $60,189 $62,043 $63,897 10-month contract $37,798 $39,343 $40,888 $42,433 $43,978 $45,523 $47,068 $48,613 $50,158 $51,703 $53,248&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can someone please tell me how to preserve the linefeed when the code read in the text from the source files? Thanks a lot&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jun 2018 20:51:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470447#M120400</guid>
      <dc:creator>stray_tachyon</dc:creator>
      <dc:date>2018-06-14T20:51:59Z</dc:date>
    </item>
    <item>
      <title>Re: Infile input - preserve linefeeds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470491#M120415</link>
      <description>&lt;P&gt;You could do this, but I would recommend you consider using Adobe Pro instead or the R package &lt;A href="https://datascienceplus.com/extracting-tables-from-pdfs-in-r-using-the-tabulizer-package/" target="_self"&gt;tabulizer&lt;/A&gt;. This is likely a one time initiative and this is more likely to be accurate and faster.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Adobe Pro allows you to convert the document to text or extract a table relatively easy using either the GUI or JavaScript if you're coding.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jun 2018 01:00:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470491#M120415</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-06-15T01:00:48Z</dc:date>
    </item>
    <item>
      <title>Re: Infile input - preserve linefeeds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470587#M120434</link>
      <description>&lt;P&gt;We have to process thousands of files.&amp;nbsp; Using Acrobat Pro is not feasible&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jun 2018 13:07:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470587#M120434</guid>
      <dc:creator>stray_tachyon</dc:creator>
      <dc:date>2018-06-15T13:07:46Z</dc:date>
    </item>
    <item>
      <title>Re: Infile input - preserve linefeeds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470622#M120442</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/215655"&gt;@stray_tachyon&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's what I would like to extract from the original text file (converted from PDF): (Forum software messed up the format.&amp;nbsp; Please see the attached&amp;nbsp;611-12921-14 (805-0145).pdf.txt file)&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;There are two options in this forum to reduce the interference of the forum software and text formatting. The icon bar a the top of the message box has one icon &lt;STRIKE&gt;{I}&lt;/STRIKE&gt;&amp;nbsp; &lt;FONT color="#800080"&gt;&lt;STRONG&gt;changed to &amp;lt;/&amp;gt;&lt;/STRONG&gt;&lt;/FONT&gt; that opens a basic code box, no color highlighting or such, which is usually the best choice for data though I also use if for code. The other is the "running man" to the right of the &lt;STRIKE&gt;"{I}"&lt;/STRIKE&gt;. This box will color format SAS code and looks "prettier".&lt;/P&gt;
&lt;P&gt;Paste code or data into either one and the data should appear at least somewhat cleaner. Things like TAB characters&amp;nbsp;though may appear differently as it seems the forum uses a largish number of spaces to display tabs.&lt;/P&gt;</description>
      <pubDate>Thu, 13 Apr 2023 16:38:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470622#M120442</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-04-13T16:38:31Z</dc:date>
    </item>
    <item>
      <title>Re: Infile input - preserve linefeeds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470634#M120445</link>
      <description>&lt;P&gt;I think you have hidden your basic question under a flurry of too much information.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;Can someone please tell me how to preserve the linefeed when the code read in the text from the source files?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;What do you mean by this statement?&amp;nbsp; Are you saying you want to store multiple lines from the input text file into a single observation of a variable?&amp;nbsp; If so then you want some variation on code like this.&amp;nbsp; You could use '0A'x to indicate a linefeed instead of '|' in the CATX() call.&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test ;
  infile 'myfile.txt' ;
  input var1 $20. / var2 $20. ;
  var3 = catx('|',var1,var2);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;If instead you mean that you want SAS to treat bare linefeeds in the raw text as normal characters and not end of line indicators then you need two things.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1) The lines have to have something else to make the true end of line. So either CR+LF like on Windows/DOS.&amp;nbsp; Or possible bare CR like the original MacOS used to use.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;2) You need to tell the INFILE statement which of those to use.&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test ;
  infile 'myfile.txt' termstr=crlf ;
  input var3 $50. ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jun 2018 15:58:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470634#M120445</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2018-06-15T15:58:51Z</dc:date>
    </item>
    <item>
      <title>Re: Infile input - preserve linefeeds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470671#M120453</link>
      <description>&lt;P&gt;Thanks a lot for your suggestions&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The code specified a location (@&amp;amp;&amp;amp;offset&amp;amp;i) in the text file, the number of characters ($&amp;amp;totchnk) to read in and place the section of text into variable "additional_provision_text".&amp;nbsp; I would like SAS place all characters, including \r\n.&amp;nbsp; Is that doable?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;			infile "&amp;amp;&amp;amp;fr&amp;amp;i." lrecl=1000000 recfm=f truncover;
			length additional_provision_text $1000;
			&lt;STRONG&gt;input  @&amp;amp;&amp;amp;offset&amp;amp;i additional_provision_text $&amp;amp;totchnk&lt;/STRONG&gt;.. @;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This is what I want to be in the&amp;nbsp;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;additional_provision_text" variable in its entirety&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;                                        SALARY GRID FOR FULL-TIME INSTRUCTORS

May 1, 2010                                                  STEPS

             Base     1                 2        3        4           5        6                      7        8        9        10

12-month

contract     $44,908  $46,744           $48,580  $50,416  $52,252     $54,088  $55,924                $57,760  $59,596  $61,432  $63,268

10-month

contract     $37,423  $38,953           $40,483  $42,013  $43,543     $45,073  $46,603                $48,133  $49,663  $51,193  $52,723

May 1, 2011

             Base     1                 2        3        4           5        6                      7        8        9        10

12-month

contract     $45,357  $47,211           $49,065  $50,919  $52,773     $54,627  $56,481                $58,335  $60,189  $62,043  $63,897

10-month

contract     $37,798  $39,343           $40,888  $42,433  $43,978     $45,523  $47,068                $48,613  $50,158  $51,703  $53,248

May 1, 2012

             Base     1                 2        3        4           5        6                      7        8        9        10

12-month

contract     $46,264  $48,155           $50,046  $51,937  $53,828     $55,719  $57,610                $59,501  $61,392  $63,283  $65,174

10-month

contract     $38,553  $40,129           $41,705  $43,281  $44,857     $46,433  $48,008                $49,584  $51,160  $52,736  $54,312

May 1, 2013

             Base     1                 2        3        4           5        6                      7        8        9        10

12-month

contract     $47,189  $49,118           $51,047  $52,976  $54,905     $56,834  $58,763                $60,692  $62,621  $64,550  $66,479

10-month

contract     $39,324  $40,932           $42,539  $44,147  $45,754     $47,362  $48,969                $50,577  $52,184  $53,792  $55,399

Salary scale excludes 4% vacation pay.&lt;/PRE&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jun 2018 18:28:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470671#M120453</guid>
      <dc:creator>stray_tachyon</dc:creator>
      <dc:date>2018-06-15T18:28:42Z</dc:date>
    </item>
    <item>
      <title>Re: Infile input - preserve linefeeds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470809#M120499</link>
      <description>&lt;P&gt;Adobe Pro works with JavaScript and has macro type functionality as well. I'm not suggesting point and click here either.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/215655"&gt;@stray_tachyon&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;We have to process thousands of files.&amp;nbsp; Using Acrobat Pro is not feasible&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 16 Jun 2018 21:52:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Infile-input-preserve-linefeeds/m-p/470809#M120499</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-06-16T21:52:27Z</dc:date>
    </item>
  </channel>
</rss>

