Hi all. My first post so please go easy on me 🙂 Our team is using SAS Contextual Analysis to do pull matching text (i.e. sick leave, wages, etc) from a bunch of collective agreements (samples: https://www.sdc.gov.on.ca/sites/mol/drs/ca/). Our process takes two steps, first step is to create a bunch of concept rules in the Contextual Analysis and process the text files to generate a number of CA datasets. The second step is to run SAS codes in the Enterprise Guide to extract a blob of text surrounding the matched terms. I'm currently trying to extract wage tables from the collective agreements. Here's what I would like to extract from the original text file (converted from PDF): (Forum software messed up the format. Please see the attached 611-12921-14 (805-0145).pdf.txt file) SALARY GRID FOR FULL-TIME INSTRUCTORS May 1, 2010 STEPS Base 1 2 3 4 5 6 7 8 9 10 12-month contract $44,908 $46,744 $48,580 $50,416 $52,252 $54,088 $55,924 $57,760 $59,596 $61,432 $63,268 10-month contract $37,423 $38,953 $40,483 $42,013 $43,543 $45,073 $46,603 $48,133 $49,663 $51,193 $52,723 May 1, 2011 Base 1 2 3 4 5 6 7 8 9 10 12-month contract $45,357 $47,211 $49,065 $50,919 $52,773 $54,627 $56,481 $58,335 $60,189 $62,043 $63,897 10-month contract $37,798 $39,343 $40,888 $42,433 $43,978 $45,523 $47,068 $48,613 $50,158 $51,703 $53,248 May 1, 2012 Base 1 2 3 4 5 6 7 8 9 10 12-month contract $46,264 $48,155 $50,046 $51,937 $53,828 $55,719 $57,610 $59,501 $61,392 $63,283 $65,174 10-month contract $38,553 $40,129 $41,705 $43,281 $44,857 $46,433 $48,008 $49,584 $51,160 $52,736 $54,312 May 1, 2013 Base 1 2 3 4 5 6 7 8 9 10 12-month contract $47,189 $49,118 $51,047 $52,976 $54,905 $56,834 $58,763 $60,692 $62,621 $64,550 $66,479 10-month contract $39,324 $40,932 $42,539 $44,147 $45,754 $47,362 $48,969 $50,577 $52,184 $53,792 $55,399 Salary scale excludes 4% vacation pay. Here's the relevant code: %do i = 1 %to &counter; /*%put Filename &&filename&i;*/ %let original_length = &&originallength&i; data snippet_&concept; /* opens the txt file and reads in starting at the offset position*/ infile "&&fr&i." lrecl=1000000 recfm=f truncover; length additional_provision_text $1000; input @&&offset&i additional_provision_text $&totchnk.. @; length provision_text $1000; input @&&originalstartoffset&i provision_text $&original_length..; length quantifiable_value $10; quantifiable_value = "&&quantifiable&i"; length document_filename $256; document_filename = "&&filename&i"; start_offset = &&originalstartoffset&i; end_offset = &&originalendoffset&i; length = &&original_length; document_id = &&docid&i; ROW_ID= &&ROWID&i; run; proc append base = &concept /*appends each record to a data set*/ data = snippet_&concept force; run; %end; %mend do_snippet; %do_snippet; I have attached the exported dataset to this post. As you can see, all the linefeeds are removed in the "additional_provision_text" column From the "611-12921-14 (805-0145).pdf.txt" file SALARY GRID FOR FULL-TIME INSTRUCTORS May 1, 2010 STEPS Base 1 2 3 4 5 6 7 8 9 10 12-month contract $44,908 $46,744 $48,580 $50,416 $52,252 $54,088 $55,924 $57,760 $59,596 $61,432 $63,268 10-month contract $37,423 $38,953 $40,483 $42,013 $43,543 $45,073 $46,603 $48,133 $49,663 $51,193 $52,723 May 1, 2011 Base 1 2 3 4 5 6 7 8 9 10 12-month contract $45,357 $47,211 $49,065 $50,919 $52,773 $54,627 $56,481 $58,335 $60,189 $62,043 $63,897 10-month contract $37,798 $39,343 $40,888 $42,433 $43,978 $45,523 $47,068 $48,613 $50,158 $51,703 $53,248 Can someone please tell me how to preserve the linefeed when the code read in the text from the source files? Thanks a lot
... View more