DATA Step, Macro, Functions and more

Finding a word in Text!

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

Finding a word in Text!

[ Edited ]

Hi everyone!

 

Could you please help me with this matter!

 

I want to find a word within a text, if that word exist in the text, put 1 in a variable!

for example, 

 

Id                ReviewText                      Pillow       Duvet

1         The Pillows was comfortable          1             0               

2          The stake was good                    0              0

3          Duvet and pillow was cheap       1               1

 

I used this code, but I keep receiving  "Variable ReviewTextt has been defined as both character and numeric" error!

 

 

data Final;
set Final1(rename=(ReviewText=ReviewTextT));
ReviewText = put(ReviewTextT, 9999.);
drop ReviewTextT;
run;
Data Final;
length ReviewText $ 9999;


data final0;
Word = scan (ReviewText,1,'*');
if word in ('pillow' 'pillows') then Pillow=1;
set final;
run;

 

Thanks in advance for helping!

H


Accepted Solutions
Solution
‎01-11-2018 07:45 PM
Super User
Posts: 13,950

Re: Finding a word in Text!

Posted in reply to Hassankalantari

Best when you get errors is to post the code and messages from log into a code box opened  with the forum {I} menu icon.

First reason is so we can see the actual code that was run and second the message windows reformat text. Error messages often show an underscore character or other diagnostics that indicate where the error occurred that get moved by the message windows on this forum.

 

ReviewText = put(ReviewTextT, 9999.); makes no sense. If the variable contains text then you should get messages about attempting to change text to numeric to use the format. Also the W. format is limited to 32.

 

For your example data, note the data step to create an actual data set to test code with:

data final1;
   infile datalines dlm=',' missover;
   format id 4. reviewtext $30.;
   input id reviewtext;
datalines;
1,The Pillows was comfortable 
2,The stake was good 
3, Duvet and pillow was cheap 
;
run;

data final;
   set final1;
   pillow = index(upcase(ReviewText),'PILLOW')>0;
   duvet = index(upcase(ReviewText),'DUVET')>0;
run;

works for the example provided. The question you have to answer is what do you want done with composite words like Pillowcase?

 

if you don't want Pillowcase then you would look to FINDW function and use options to ignore case.

 

Your attempted "scan" approach would require you to loop over the phrase for each word to do the comparison.

 

index searches the entire variable for the given text and returns a position if found. The upcase is because "pillow" is not the same as "Pillow" or "piLLow" so you need to address the possible case of the word. In this case comparing all upper case. The >0 will return a 1 for true if the text is found and 0 for false if not found.

View solution in original post


All Replies
Super User
Super User
Posts: 8,290

Re: Finding a word in Text!

Posted in reply to Hassankalantari

Just look at the last two data steps you ran:

Data Final;
  length ReviewText $ 9999;
run;

data final0;
  Word = scan (ReviewText,1,'*');
  if word in ('pillow' 'pillows') then Pillow=1;
  set final;
run;

The first one will make an empty dataset named FINAL with the single character variable named ReviewText. 

Then in the second data step you first start using two variables in an assignment statement that you have never defined before.  So REVIEWTEXT will be made a number since there is nothing to tell SAS otherwise how to define it.  WORD will be defined as character since the SCAN() function always returns character values.  It will default to length 12 since the default length that SAS will use to convert the numeric variable to character is BEST12.  Then when SAS gets to the SET statement is has a conflict with the definition for REVIEWTEXT.

 

You could use the FINDW() function.

data have ;
  input id text $50. ;
cards;
1 The Pillows was comfortable
2 The stake was good
3 Duvet and pillow was cheap
;
data want ;
  set have ;
  pillow=findw(text,'pillow',,'spit') or findw(text,'pillows',,'spit');
run;

 

 

Occasional Contributor
Posts: 11

Re: Finding a word in Text!

Thank you so much for the reply!

I forgot to mention that I have a data set called Final  and in the data set final I have many variable and one of them is reviewtext. 

 

 

I will try the one that you send!

Cheers

H

Super User
Posts: 24,028

Re: Finding a word in Text!

Posted in reply to Hassankalantari

FINDW or INDEXW are what you really want.

Occasional Contributor
Posts: 11

Re: Finding a word in Text!

Thanks!Smiley Happy I will also try these! Hopefully it works for me!Smiley Happy
Solution
‎01-11-2018 07:45 PM
Super User
Posts: 13,950

Re: Finding a word in Text!

Posted in reply to Hassankalantari

Best when you get errors is to post the code and messages from log into a code box opened  with the forum {I} menu icon.

First reason is so we can see the actual code that was run and second the message windows reformat text. Error messages often show an underscore character or other diagnostics that indicate where the error occurred that get moved by the message windows on this forum.

 

ReviewText = put(ReviewTextT, 9999.); makes no sense. If the variable contains text then you should get messages about attempting to change text to numeric to use the format. Also the W. format is limited to 32.

 

For your example data, note the data step to create an actual data set to test code with:

data final1;
   infile datalines dlm=',' missover;
   format id 4. reviewtext $30.;
   input id reviewtext;
datalines;
1,The Pillows was comfortable 
2,The stake was good 
3, Duvet and pillow was cheap 
;
run;

data final;
   set final1;
   pillow = index(upcase(ReviewText),'PILLOW')>0;
   duvet = index(upcase(ReviewText),'DUVET')>0;
run;

works for the example provided. The question you have to answer is what do you want done with composite words like Pillowcase?

 

if you don't want Pillowcase then you would look to FINDW function and use options to ignore case.

 

Your attempted "scan" approach would require you to loop over the phrase for each word to do the comparison.

 

index searches the entire variable for the given text and returns a position if found. The upcase is because "pillow" is not the same as "Pillow" or "piLLow" so you need to address the possible case of the word. In this case comparing all upper case. The >0 will return a 1 for true if the text is found and 0 for false if not found.

Occasional Contributor
Posts: 11

Re: Finding a word in Text!

Thank you so much for the reply!

I will try your codes!

 

 

Occasional Contributor
Posts: 11

Re: Finding a word in Text!

It works!!!! Thanks alot!

Occasional Contributor
Posts: 11

Re: Finding a word in Text!

Hi Reeza,

 

Could you please help me with this one too, 

 

How about if I want to find two words in a text!

This time I want to find "Soundbar"! but some people write "Sound Bar". with an space between Sound and bar!

 

data Final_reviews1;
set Final_reviews;
SoundBar = index(upcase(ReviewText),'SOUNDBAR')>0;
run;

 

Thanks in advance

Contributor
Posts: 57

Re: Finding a word in Text!

[ Edited ]
Posted in reply to Hassankalantari

The compress function removes specified characters. The default is blank
characters.

index(upcase(compress(ReviewText),'SOUNDBAR')>0



Alternatively, you can use the FIND function where you can specify a modifier to search with
case insensitivity.

find(compress(ReviewText),'soundbar','i')>0



would locate SOUNDBAR, SOUND BAR, soundbar, sound bar, SoUnd baR, etc.


Find:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002267763.htm
Compress:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 224 views
  • 1 like
  • 5 in conversation