<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extract substribg in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540542#M149124</link>
    <description>Hi ,&lt;BR /&gt;&lt;BR /&gt;Basically I am trying to pull substring from column C1 based on matching&lt;BR /&gt;values in column C2&lt;BR /&gt;. Column C2 contains single word having distinct values but have more than&lt;BR /&gt;ten thousand records</description>
    <pubDate>Tue, 05 Mar 2019 18:53:00 GMT</pubDate>
    <dc:creator>Rohit_1990</dc:creator>
    <dc:date>2019-03-05T18:53:00Z</dc:date>
    <item>
      <title>Extract substribg</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540523#M149116</link>
      <description>&lt;P&gt;Hi Experts,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to extract sub string from a string&amp;nbsp; by matching sub string to column values in a another table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;for ex.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;say in table A having column C1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Values is stored as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;c1&lt;/P&gt;&lt;P&gt;jack is walking.&lt;/P&gt;&lt;P&gt;jack walks.&lt;/P&gt;&lt;P&gt;jack is running.&lt;/P&gt;&lt;P&gt;Jack runs daily.&lt;/P&gt;&lt;P&gt;Jack is jumping&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now we have another table B having column C2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;c2&lt;/P&gt;&lt;P&gt;walking&lt;/P&gt;&lt;P&gt;walks&lt;/P&gt;&lt;P&gt;running&lt;/P&gt;&lt;P&gt;runs&lt;/P&gt;&lt;P&gt;jump&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now I want to extract from table A column C1 values as matched&amp;nbsp; based on&amp;nbsp; values from table B column c2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;c1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Extracted_value&lt;/P&gt;&lt;P&gt;jack is walking.&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Walking&lt;/P&gt;&lt;P&gt;jack walks.&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Walks&lt;/P&gt;&lt;P&gt;jack is running .&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Running&lt;/P&gt;&lt;P&gt;Jack runs daily.&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;runs&lt;/P&gt;&lt;P&gt;Jack is jumping&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also As seen in last case "Jack is jumping" we have value similar value&amp;nbsp; "Jump" in table B&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;so can we use fuzzy here to extract values.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Table A and Table B contain more than million records so not being able to use regex to subs-tr the values.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;</description>
      <pubDate>Tue, 05 Mar 2019 18:00:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540523#M149116</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-05T18:00:34Z</dc:date>
    </item>
    <item>
      <title>Re: Extract substribg</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540531#M149119</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/261497"&gt;@Rohit_1990&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Have you looked at using the scan function ?&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data Want;
     set Have; 
     FirstStr = scan(c1,1,' '); 
     SecondStr = scan(c1,2,' '); 
Run; &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 05 Mar 2019 18:12:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540531#M149119</guid>
      <dc:creator>UdayGuntupalli</dc:creator>
      <dc:date>2019-03-05T18:12:04Z</dc:date>
    </item>
    <item>
      <title>Re: Extract substribg</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540534#M149120</link>
      <description>Hi ,&lt;BR /&gt;&lt;BR /&gt;Thanks for your suggestion but as I mentioned in my question the data A and&lt;BR /&gt;dataset B contains million records and the pattern to be extracted has to&lt;BR /&gt;be matched with column C2 from table B and besides column C1 can be quite&lt;BR /&gt;complex.&lt;BR /&gt;&lt;BR /&gt;Here I have illustrated a very simple case to explain my issue. But column&lt;BR /&gt;C1 in table A can be like this also&lt;BR /&gt;C1&lt;BR /&gt;He is running over the bridge.&lt;BR /&gt;He doesn't talk while running&lt;BR /&gt;&lt;BR /&gt;And so forth so as you can see pattern to be extracted is not fixed at any&lt;BR /&gt;fixed position so scan does not work here.&lt;BR /&gt;&lt;BR /&gt;Also table B contain say ten thousand distinct value so Even prxmatch does&lt;BR /&gt;not work in this case.&lt;BR /&gt;&lt;BR /&gt;Please revert in case of any concerns.&lt;BR /&gt;&lt;BR /&gt;Regards</description>
      <pubDate>Tue, 05 Mar 2019 18:27:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540534#M149120</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-05T18:27:00Z</dc:date>
    </item>
    <item>
      <title>Re: Extract substribg</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540537#M149121</link>
      <description>&lt;P&gt;As you have two tables, it is not clear what do you want to do.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Shall the program check each table-2 C2 value in all table-1 observations ?&lt;/P&gt;
&lt;P&gt;Then how many observations are in table-2?&lt;/P&gt;
&lt;P&gt;Can an observation of table-1 match more then one table-2 C2 value ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Anywhere, you can check is a given string exist in a longer string by, for example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;c1 = 'Someone is jumping';
c2 = 'Jump';    /* pay attention to case - upper/lower case */

position = index(upcase((c1), upcase(c2) ); &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If c2 exist in c1 then the result is the position of c2 in c1, otherwise position=0.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Mar 2019 18:32:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540537#M149121</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2019-03-05T18:32:51Z</dc:date>
    </item>
    <item>
      <title>Re: Extract substribg</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540542#M149124</link>
      <description>Hi ,&lt;BR /&gt;&lt;BR /&gt;Basically I am trying to pull substring from column C1 based on matching&lt;BR /&gt;values in column C2&lt;BR /&gt;. Column C2 contains single word having distinct values but have more than&lt;BR /&gt;ten thousand records</description>
      <pubDate>Tue, 05 Mar 2019 18:53:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540542#M149124</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-05T18:53:00Z</dc:date>
    </item>
    <item>
      <title>Re: Extract substribg</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540561#M149132</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/261497"&gt;@Rohit_1990&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I had a similar problem once, and it turned out that the following was quite efficient, even if it sounds silly to start with multiplying all records in the the large data set . It could be reduced by using a stop word list ("is" in this example), and some sort of fuziness could be added by cleaning up C2, so "walking" and "walks" are collapsed to "walk".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data a; 
	infile datalines truncover;
	input c1 $char80.;
	datalines;
jack is walking.
jack walks.
jack is running.
Jack runs daily.
Jack is jumping
He is running over the bridge.
He doesn't talk while running
;
run;

data b; 
	infile datalines truncover;
	input c2 $20.;
	datalines;
walking
walks
running
runs
jump
;
run;

data w;
	set a;
	length word $20;
	do i = 1 to countw(c1);
		word = scan(c1,i);
		if word not in ('is') then output;
	end;
run;

proc sql;
	create table want as 
		select w.c1, b.c2
		from w, b
		where lowcase(w.word) EQT lowcase(b.c2);
quit;
	&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 05 Mar 2019 20:09:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540561#M149132</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2019-03-05T20:09:12Z</dc:date>
    </item>
    <item>
      <title>Re: Extract substribg</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540650#M149178</link>
      <description>&lt;P&gt;There are several methods to solve the issue:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1) Use hash iterate method to hold table-2 in memory; I'm not expert with this method;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2) Use format method as given here (not tested):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* creating format from table-2 */

data cntl;
    retain fmtname '$ok'  
              label       '1'
        ;
    start = locase(c2);
    end = start;
    output;
run;
proc format lib=work cntlin=cntl noprint; run;

/* check words in table-1 */
data want;
  set table1;
        do i=1 countw(c1);
             word = locase(scan(c2) , i );
             if put(word, $ok1.) = '1' then extracted = scan(c1,i);
             else do;
               if (length(strip(word)) &amp;gt; 5 and index(word,'ing')&amp;gt;2
               then word = substr(word,1,length(word)-3);
               if put(word, $ok1.) = '1' then extracted = scan(c1,i);
               else extracted = ' ';
            end;
run;      &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;you may need checks, not only for 'ing' but also for 'ed' (past time) etc.&lt;/P&gt;</description>
      <pubDate>Wed, 06 Mar 2019 05:22:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540650#M149178</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2019-03-06T05:22:17Z</dc:date>
    </item>
    <item>
      <title>Re: Extract substribg</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540720#M149214</link>
      <description>Is there not a text mining forum with more extensive methods</description>
      <pubDate>Wed, 06 Mar 2019 10:03:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540720#M149214</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2019-03-06T10:03:55Z</dc:date>
    </item>
    <item>
      <title>Re: Extract substribg</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540747#M149224</link>
      <description>&lt;P&gt;I suggest to split table A into many small tables and do matching .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data a; 
	infile datalines truncover;
	input c1 $char80.;
	datalines;
jack is walking.
jack walks.
jack is running.
Jack runs daily.
Jack is jumping
He is running over the bridge.
He doesn't talk while running
;
run;

data b; 
	infile datalines truncover;
	input c2 $20.;
	datalines;
walking
walks
running
runs
jump
;
run;
proc sql;
create table want as
select *
 from a left join b
  on a.c1 contains strip(c2);
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;P.S. tables is stealing from Jenson.&lt;/P&gt;</description>
      <pubDate>Wed, 06 Mar 2019 12:06:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-substribg/m-p/540747#M149224</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-03-06T12:06:16Z</dc:date>
    </item>
  </channel>
</rss>

