<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Separate words in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541846#M149689</link>
    <description>Well in the 1st case&lt;BR /&gt;Let's say I have a dataset have with column C1&lt;BR /&gt;C1&lt;BR /&gt;'he is walkingwalking'&lt;BR /&gt;'ron was sleeping at thatthat time'&lt;BR /&gt;&lt;BR /&gt;So basically I want to see if two identical words have got stuck together and if yes then try to separate them&lt;BR /&gt;&lt;BR /&gt;So my output would be like thid&lt;BR /&gt;&lt;BR /&gt;C1&lt;BR /&gt;he is walking walking'&lt;BR /&gt;'ron was sleeping at that that time'&lt;BR /&gt;&lt;BR /&gt;So walking and walking was stuck together but are identical so I want them to be separated.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;In my second case.&lt;BR /&gt;I want to check if a pattern is matched then pull out that string.&lt;BR /&gt;&lt;BR /&gt;So my pattern is 'abc' followed by any word followed by 5 digit number.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;In dataset B column A1 has value ad&lt;BR /&gt;A1&lt;BR /&gt;' srt abc- rty 50987'&lt;BR /&gt;' ftu ght abc trying 76543'&lt;BR /&gt;&lt;BR /&gt;So I want to extract&lt;BR /&gt;abc- rty 50987&lt;BR /&gt;And trying 76543&lt;BR /&gt;&lt;BR /&gt;I hope this might help you.&lt;BR /&gt;&lt;BR /&gt;Thanks a ton !!!!&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Sun, 10 Mar 2019 18:14:13 GMT</pubDate>
    <dc:creator>Rohit_1990</dc:creator>
    <dc:date>2019-03-10T18:14:13Z</dc:date>
    <item>
      <title>Separate words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541834#M149681</link>
      <description>Hi Experts,&lt;BR /&gt;&lt;BR /&gt;In the below dataset I have identical words but they do not have any delimiter between them.&lt;BR /&gt;I want to separate them by putting a save&lt;BR /&gt;For example if value is 'SteveSteve is readingreading' then I want output to be 'Steve Steve is reading reading'&lt;BR /&gt;&lt;BR /&gt;Also I want to extract values of it matches a particular pattern.&lt;BR /&gt;&lt;BR /&gt;Say if for a substring previous substring is 'abc' and next substring is 5 digit number then I want to extract that substring along with 'abc'&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;So for example in string&lt;BR /&gt;' 123 srt abc wer 12345'&lt;BR /&gt;I want to extract ' abc wer' as it matches my condition aforementioned.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Rohit&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sun, 10 Mar 2019 16:30:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541834#M149681</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-10T16:30:17Z</dc:date>
    </item>
    <item>
      <title>Re: Separate words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541843#M149687</link>
      <description>&lt;P&gt;Please give some more examples of input and output.&lt;/P&gt;</description>
      <pubDate>Sun, 10 Mar 2019 17:52:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541843#M149687</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2019-03-10T17:52:42Z</dc:date>
    </item>
    <item>
      <title>Re: Separate words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541844#M149688</link>
      <description>Hi ,&lt;BR /&gt;&lt;BR /&gt;Thanks for you r</description>
      <pubDate>Sun, 10 Mar 2019 17:58:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541844#M149688</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-10T17:58:50Z</dc:date>
    </item>
    <item>
      <title>Re: Separate words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541846#M149689</link>
      <description>Well in the 1st case&lt;BR /&gt;Let's say I have a dataset have with column C1&lt;BR /&gt;C1&lt;BR /&gt;'he is walkingwalking'&lt;BR /&gt;'ron was sleeping at thatthat time'&lt;BR /&gt;&lt;BR /&gt;So basically I want to see if two identical words have got stuck together and if yes then try to separate them&lt;BR /&gt;&lt;BR /&gt;So my output would be like thid&lt;BR /&gt;&lt;BR /&gt;C1&lt;BR /&gt;he is walking walking'&lt;BR /&gt;'ron was sleeping at that that time'&lt;BR /&gt;&lt;BR /&gt;So walking and walking was stuck together but are identical so I want them to be separated.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;In my second case.&lt;BR /&gt;I want to check if a pattern is matched then pull out that string.&lt;BR /&gt;&lt;BR /&gt;So my pattern is 'abc' followed by any word followed by 5 digit number.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;In dataset B column A1 has value ad&lt;BR /&gt;A1&lt;BR /&gt;' srt abc- rty 50987'&lt;BR /&gt;' ftu ght abc trying 76543'&lt;BR /&gt;&lt;BR /&gt;So I want to extract&lt;BR /&gt;abc- rty 50987&lt;BR /&gt;And trying 76543&lt;BR /&gt;&lt;BR /&gt;I hope this might help you.&lt;BR /&gt;&lt;BR /&gt;Thanks a ton !!!!&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sun, 10 Mar 2019 18:14:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541846#M149689</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-10T18:14:13Z</dc:date>
    </item>
    <item>
      <title>Re: Separate words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541851#M149690</link>
      <description>&lt;P&gt;You can loop across words like in:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;c1 = "heis walkingwalking";

do i=1 to countw(c1);
     word = scan(c1,i);
     put i= word=;
end;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;then check each word does it contain pair number of characters then check is first half same as second half&lt;/P&gt;
&lt;P&gt;&amp;nbsp;- using length() and substr() functions.&lt;/P&gt;
&lt;P&gt;But what about "aa" - is it a double "a" ? or "walki&lt;STRONG&gt;n&lt;/STRONG&gt;gwalki&lt;STRONG&gt;m&lt;/STRONG&gt;g" a double word with some typo ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As to the second request you need check like:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;c1 = 'srt abc- rty 50987';&lt;BR /&gt;do i=1 to countw(c1);
   word = scan(c1,i);&lt;BR /&gt;   if word='abc' then do;&lt;BR /&gt;      tmp = combl(c1,'kd');  /* compress blank, keep digits */&lt;BR /&gt;      if tmp ne ' ' then &lt;BR /&gt;         do j=1 to (coutw(tmo));&lt;BR /&gt;            num_str = scan(tmp,j);&lt;BR /&gt;            if length(numstr) = 5 and&lt;BR /&gt;               index(c1,numstr) &amp;gt; index(c1,'abc') &lt;BR /&gt;            then wanted = substr(c1,index(c1));&lt;BR /&gt;         end;&lt;BR /&gt;   end;&lt;BR /&gt;&lt;BR /&gt;Try to create a test code of above. In case of issues post&lt;BR /&gt;your code, the log and point what issues you have.&lt;BR /&gt; &lt;BR /&gt;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 10 Mar 2019 18:59:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541851#M149690</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2019-03-10T18:59:57Z</dc:date>
    </item>
    <item>
      <title>Re: Separate words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541852#M149691</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/261497"&gt;@Rohit_1990&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regular Expressions are what you're looking for when it comes to dealing with text patterns.&lt;/P&gt;
&lt;P&gt;I wasn't really sure about your 2nd pattern: Is this pattern now require as first word &lt;EM&gt;abc&lt;/EM&gt; or any string that starts with &lt;EM&gt;abc&lt;/EM&gt;? Given the dash you've had in your first sample I've been going for the 2nd option - and string starting with &lt;EM&gt;abc&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  infile datalines truncover;
  input str $255.;
  datalines;
he is walkingwalking ww walkingwalkings
ron was sleeping at thatthat time
srt abc- rty 50987
ftu ght abc trying 76543
;
run;

data want;
  set have;
  str_new=str;
  /* add blank between repeated string of at least two characters per repetition */
  str_new=prxchange('s/\b(\w{2,99})(\1)\b/\1 \2/oi',-1,strip(str_new));
  /* extract pattern */
  str_new=prxchange('s/^.*(\babc\S*\s+\w+\s+\d{5}\b).*$/\1/oi',-1,strip(str_new));
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The metacharacters used in the RegEx are documented here:&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p0s9ilagexmjl8n1u7e1t1jfnzlk.htm&amp;nbsp;" target="_blank" rel="noopener"&gt;http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p0s9ilagexmjl8n1u7e1t1jfnzlk.htm&amp;nbsp;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 10 Mar 2019 19:10:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541852#M149691</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-03-10T19:10:28Z</dc:date>
    </item>
    <item>
      <title>Re: Separate words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541878#M149705</link>
      <description>&lt;P&gt;I would simplify to:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set have;
  /* add blank between repeated string of at least two characters per repetition */
  str_new=prxchange('s/\b(\w{2,})\1\b/\1 \1/oi', -1, str);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 11 Mar 2019 03:47:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Separate-words/m-p/541878#M149705</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2019-03-11T03:47:28Z</dc:date>
    </item>
  </channel>
</rss>

