Query text string located within first nth location

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 84
Accepted Solution

Query text string located within first nth location

hi -

 

I'm querying a table field which captures approx 300 characters / 30 words (example: staff enter comments from a customer).

 

I'm trying to query the comments where a specific word (example: 'table') is found, and is located within the first nth words.

 

I'm currently using the 'like' function, which works well. (Example code: data test; set data; where field_name like ‘tabl%’;runSmiley Wink

 

 However, i'd like to further refine my results, to only show when the word I'm searching for is located in the first 8 words (do not return value, if 'table' entered father than the first 8 words). Would this be possible using a variant of the SCAN function, or other?

 

Thanks in advance.

 


Accepted Solutions
Solution
‎05-10-2016 10:35 AM
Esteemed Advisor
Esteemed Advisor
Posts: 7,250

Re: Query text string located within first nth location

i can actually be dropped, that is just the incrementor in the do loop.  Found will be either missing or 1 depending on if the string was found, however without seeing the data I can't tell you what is wrong as:

%let maxwords=4;

data want (drop=i); 
  length column_name $200;
  infile datalines dlm=",";
  input column_name $;
  do i=1 to &maxwords.;  
    if index(scan(column_name,i," "),"tabl")>0 then found=1; 
  end;
datalines;
a long sentance where does not contain table until after the place
a sentance with table within the length permitted
;
run;

Works fine, flagging the second record as that has tabl within the first four words.  Maybe you need upper casing?  Special characters, different delimeters?  Is your variable column_name?  As I mention, can't tell without seeing the data.

View solution in original post


All Replies
Super Contributor
Posts: 305

Re: Query text string located within first nth location

Hello,

 

Assuming the word is in the first 4 words of the sentence:

 

data have;
*from is the 4 th word - the observation is listed;
a='staff enter comments from a customer';
call scan(a,5,pos,len);
b=findw(substr(a,1,pos),'from');
if b gt 0 then output;
*from is the 5 th word - the observation is NOT listed;
a='staff enter comments received from a customer';
call scan(a,5,pos,len);
b=findw(substr(a,1,pos),'from');
if b gt 0 then output;
run;

Esteemed Advisor
Esteemed Advisor
Posts: 7,250

Re: Query text string located within first nth location

I would go something like:

data have;
  text="a word for a long sentance with here found"; output;
  text="a shorter here found part"; output;
run;
%let maxwords=4;
data want;  
  set have;
  do i=1 to &maxwords.;
    if scan(text,i," ")="here" then found=1;
  end;
run;
Frequent Contributor
Posts: 84

Re: Query text string located within first nth location

Thanks you for the suggested code. I ran it, however, all it did was add 2 columns:  "1" with variable 5, and column "found" empty [.], throughout. Pls note this is my 1st time using %let, and I am using sas 9.3.  Here is the code that I pasted:

 

%let maxwords=4;

data want; set have;

do i=1 to &maxwords.;  

if scan(column_name,i,‘ ’)=‘tabl%’ then found=1; end;run;

Esteemed Advisor
Esteemed Advisor
Posts: 7,250

Re: Query text string located within first nth location

Can I suggest the use of the code editor {i} in toolbar above post, and consitent formatting, makes code much easier to read.  You have odd quote marks in that, not sure if that has an impact.  Also, you are using a wildcard in your text, this wasn't indicated before, you just stated to find a word, I have changed it:

%let maxwords=4;

data want; 
  set have;
  do i=1 to &maxwords.;  
    if index(scan(column_name,i," "),"tabl")>0 then found=1; 
  end;
run;
Frequent Contributor
Posts: 84

Re: Query text string located within first nth location

thanks RW9 for the tip... re-tried the code but same results. It added two columns, "i" with value of 5, and "found"  with empty value throughout. thoughts? Thanks again

Solution
‎05-10-2016 10:35 AM
Esteemed Advisor
Esteemed Advisor
Posts: 7,250

Re: Query text string located within first nth location

i can actually be dropped, that is just the incrementor in the do loop.  Found will be either missing or 1 depending on if the string was found, however without seeing the data I can't tell you what is wrong as:

%let maxwords=4;

data want (drop=i); 
  length column_name $200;
  infile datalines dlm=",";
  input column_name $;
  do i=1 to &maxwords.;  
    if index(scan(column_name,i," "),"tabl")>0 then found=1; 
  end;
datalines;
a long sentance where does not contain table until after the place
a sentance with table within the length permitted
;
run;

Works fine, flagging the second record as that has tabl within the first four words.  Maybe you need upper casing?  Special characters, different delimeters?  Is your variable column_name?  As I mention, can't tell without seeing the data.

Frequent Contributor
Posts: 84

Re: Query text string located within first nth location

thanks for the tip; my text needed to be upper case. Question: if i am looking for all variations of the word, instead of [table] let's say I need to look for [request] but I also want to find spelling variation including mispellings. Could we use a wildcard so that the text is spelled  [reque%]? 

Esteemed Advisor
Esteemed Advisor
Posts: 7,250

Re: Query text string located within first nth location

Thats a bit more in depth.  What for instance consitutes a spelling mistake and not another word, etc.  There are lots of permutations to such a thing.  You would be best off looking at Perl Regular Expressions, bit of a learning curve, but once you know it you can do pattern matching on text:
http://www2.sas.com/proceedings/sugi29/265-29.pdf

 

Or if you deal with lots of data like that, then maybe using Text Analytics would be a good idea.

Frequent Contributor
Posts: 84

Re: Query text string located within first nth location

thanks for your reply.. will look into the doc you shared ... I found a simple workaround... I capture the first 26 chars from my column and paste into the new column. Then i use the "like" function to query wildcard text:

 

DATA TEST; 
   SET HAVE;
     NEW=SUBSTR(COLUMN_NAME,1,26);
       RUN;

DATA TEST2; SET TEST;
     WHERE NEW LIKE ‘REQUE%’;
	RUN;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 468 views
  • 0 likes
  • 3 in conversation