<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Read complex text data in SAS in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/352983#M82352</link>
    <description>&lt;P&gt;I am trying to read data coming from AP systems in company. The system throws some random rows repeating in the data. I need to delete those rows and then read the remaining data. In the below example, text ID to Country along-with '_____' is repeated multiple times. i need to remove that, and then i can see a pattern of 3 rows. 3 rows in this below data should ideally be 1 row as output in sas dataset. there are certain scenarios where 2nd and 3rd rows may be missing as well. but ID will always be present. I have close 13,000 records with more files as well Can anyone please help?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;ID Name Phone extension Address City State Country __________________________________________________________&lt;/STRONG&gt; 11050ATR 1105ABCCORP (999) 999-9999 Ext. 0000&lt;BR /&gt;PO Box 99999&lt;BR /&gt;Los Angeles CA 9999 USA 11050ATS 1105ABCCORP1 (999) 999-9998 Ext. 0000&lt;BR /&gt;PO Box 888&lt;BR /&gt;Los Angeles CA 9999 USA 11050ATQ 1105ABCCORPr (999) 999-9999 Ext. 0000&lt;/P&gt;
&lt;P&gt;11050ATC 1105ABCCORPq (999) 999-9999 Ext. 0000&lt;BR /&gt;PO Box 0000&lt;BR /&gt;Los Angeles CA 9999 USA&lt;/P&gt;
&lt;P&gt;**ID Name Phone extension&lt;/P&gt;
&lt;P&gt;Address&lt;/P&gt;
&lt;P&gt;City State Country&lt;/P&gt;
&lt;P&gt;___________________________________________________________**&lt;/P&gt;
&lt;P&gt;11050ATQ 1105ABCCORPr (999) 999-9999 Ext. 0000&lt;/P&gt;
&lt;P&gt;11050ATC 1105ABCCORPq (999) 999-9999 Ext. 0000&lt;BR /&gt;PO Box 0000&lt;BR /&gt;Los Angeles CA 9999 USA&lt;/P&gt;</description>
    <pubDate>Mon, 24 Apr 2017 19:19:00 GMT</pubDate>
    <dc:creator>Saurabh_Amar</dc:creator>
    <dc:date>2017-04-24T19:19:00Z</dc:date>
    <item>
      <title>Read complex text data in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/352983#M82352</link>
      <description>&lt;P&gt;I am trying to read data coming from AP systems in company. The system throws some random rows repeating in the data. I need to delete those rows and then read the remaining data. In the below example, text ID to Country along-with '_____' is repeated multiple times. i need to remove that, and then i can see a pattern of 3 rows. 3 rows in this below data should ideally be 1 row as output in sas dataset. there are certain scenarios where 2nd and 3rd rows may be missing as well. but ID will always be present. I have close 13,000 records with more files as well Can anyone please help?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;ID Name Phone extension Address City State Country __________________________________________________________&lt;/STRONG&gt; 11050ATR 1105ABCCORP (999) 999-9999 Ext. 0000&lt;BR /&gt;PO Box 99999&lt;BR /&gt;Los Angeles CA 9999 USA 11050ATS 1105ABCCORP1 (999) 999-9998 Ext. 0000&lt;BR /&gt;PO Box 888&lt;BR /&gt;Los Angeles CA 9999 USA 11050ATQ 1105ABCCORPr (999) 999-9999 Ext. 0000&lt;/P&gt;
&lt;P&gt;11050ATC 1105ABCCORPq (999) 999-9999 Ext. 0000&lt;BR /&gt;PO Box 0000&lt;BR /&gt;Los Angeles CA 9999 USA&lt;/P&gt;
&lt;P&gt;**ID Name Phone extension&lt;/P&gt;
&lt;P&gt;Address&lt;/P&gt;
&lt;P&gt;City State Country&lt;/P&gt;
&lt;P&gt;___________________________________________________________**&lt;/P&gt;
&lt;P&gt;11050ATQ 1105ABCCORPr (999) 999-9999 Ext. 0000&lt;/P&gt;
&lt;P&gt;11050ATC 1105ABCCORPq (999) 999-9999 Ext. 0000&lt;BR /&gt;PO Box 0000&lt;BR /&gt;Los Angeles CA 9999 USA&lt;/P&gt;</description>
      <pubDate>Mon, 24 Apr 2017 19:19:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/352983#M82352</guid>
      <dc:creator>Saurabh_Amar</dc:creator>
      <dc:date>2017-04-24T19:19:00Z</dc:date>
    </item>
    <item>
      <title>Re: Read complex text data in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/352990#M82355</link>
      <description>&lt;P&gt;In a data step, retain the id so that you have a column of IDs in addition to the rows of data you have below. &amp;nbsp;So then you would have two columns: one with the ID (repeated) and one with the data you have below. &amp;nbsp;Once you do this, select the distinct responses in your data below, grouped by the ID. &amp;nbsp;That way, each response will only appear once. &amp;nbsp;Then, delete the rows where the IDs in your ID column match the data in your dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Apr 2017 19:27:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/352990#M82355</guid>
      <dc:creator>thomp7050</dc:creator>
      <dc:date>2017-04-24T19:27:13Z</dc:date>
    </item>
    <item>
      <title>Re: Read complex text data in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/352993#M82356</link>
      <description>&lt;P&gt;Posting data from a file like this should be done in a code box opened with the forum {i} icon. The main message window will reformat things making it harder to understand in terms of the actual problem.&lt;/P&gt;
&lt;P&gt;For instance, it appears as if this bit of text repeats as a header to each record:&lt;/P&gt;
&lt;PRE&gt;ID              Name             Phone          extension
Address
City            State            Country
__________________________________________________________
&lt;/PRE&gt;
&lt;P&gt;Is that a correct understanding of the file layout?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You should provide some way to know what the desired output it should actually look like. For instance you say "where 2nd and 3rd rows may be missing". Which are the rows that may be missing? There are multiple rows of values:&lt;/P&gt;
&lt;PRE&gt;11050ATR	1105ABCCORP	(999) 999-9999  Ext. 0000		
PO Box 99999					
Los Angeles	CA	9999	USA
11050ATS	1105ABCCORP1	(999) 999-9998  Ext. 0000		
PO Box 888					
Los Angeles	CA	9999	USA
11050ATQ	1105ABCCORPr	(999) 999-9999  Ext. 0000		


11050ATC	1105ABCCORPq	(999) 999-9999  Ext. 0000		
PO Box 0000					
Los Angeles	CA	9999	USA
&lt;/PRE&gt;
&lt;P&gt;WHICH one is going to be there, which are "2nd and 3rd", and if they are included in the output data what would they look like.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Apr 2017 19:31:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/352993#M82356</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-04-24T19:31:58Z</dc:date>
    </item>
    <item>
      <title>Re: Read complex text data in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/352995#M82357</link>
      <description>&lt;P&gt;Apologies for not being able to format the question properly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To your questions&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1) Yes you are correct this text (rather group of text) appears multiple times but not with every record - it may be a random repeat.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2) Desired out put would be&amp;nbsp;8 columns with values of ID, Name, Phone, Extension, address, City, State, Country - We may have anything missing but for ID. Id will always be present. So records may look like&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;101C0	101ABC	(000) 000-0000  Ext. 0000
(missing)
(missing)

		
101C0	101ABC	(000) 000-0000  Ext. 0000
(missing)
(missing)

101C0	101ABC	(000) 000-0000  Ext. 0000
2/48 Newways 
Los Angeles	CA	90189	USA		
					&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Apr 2017 19:42:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/352995#M82357</guid>
      <dc:creator>Saurabh_Amar</dc:creator>
      <dc:date>2017-04-24T19:42:08Z</dc:date>
    </item>
    <item>
      <title>Re: Read complex text data in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/353007#M82363</link>
      <description>&lt;P&gt;The "desired results" include records with an ID of 101c0 which isn't even shown in the originally posted data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm sure people would be better able to assist in a solution if a more complete presentation of the problem was offered.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Apr 2017 20:55:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/353007#M82363</guid>
      <dc:creator>HB</dc:creator>
      <dc:date>2017-04-24T20:55:19Z</dc:date>
    </item>
    <item>
      <title>Re: Read complex text data in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/353055#M82381</link>
      <description>&lt;P&gt;I doubt it's random, more likely whatever it's coming from is having a formatted report converted to a txt file and the original file has&amp;nbsp;this is a page footer/header. Check if the 'random' text is the same and repeated throughout the document. If it is, you can probably write some quick code to delete&amp;nbsp;all the occurences.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Apr 2017 01:50:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/353055#M82381</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-04-25T01:50:52Z</dc:date>
    </item>
    <item>
      <title>Re: Read complex text data in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/353082#M82387</link>
      <description>Yes the text is always the same but occurrence is random. No fixed pattern there</description>
      <pubDate>Tue, 25 Apr 2017 03:17:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/353082#M82387</guid>
      <dc:creator>Saurabh_Amar</dc:creator>
      <dc:date>2017-04-25T03:17:25Z</dc:date>
    </item>
    <item>
      <title>Re: Read complex text data in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/353085#M82389</link>
      <description>&lt;P&gt;Have you considered&amp;nbsp;processing the file twice?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once to remove the repeated text/standardize the format&amp;nbsp;and in the second pass, actually read the data in the file.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Apr 2017 03:38:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/353085#M82389</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-04-25T03:38:19Z</dc:date>
    </item>
    <item>
      <title>Re: Read complex text data in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/353086#M82390</link>
      <description>&lt;P&gt;You could try filtering out the extra lines with:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
infile "&amp;amp;sasforum\datasets\Amar_test.txt" truncover;
file "&amp;amp;sasforum\datasets\Amar_test_out.txt";
input @;
if _infile_ = " " then delete;
if _infile_ =: "____" then delete;
input line1 $200. / line2 $200. / line3 $200.;
if substr(line1,1,2) ne "ID" then put line1 / line2 / line3;
run; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;please provide a more elaborate test file if that doesn't work.&lt;/P&gt;</description>
      <pubDate>Tue, 25 Apr 2017 03:39:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-complex-text-data-in-SAS/m-p/353086#M82390</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2017-04-25T03:39:10Z</dc:date>
    </item>
  </channel>
</rss>

