- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone!
Could you please help me with this matter!
I want to find a word within a text, if that word exist in the text, put 1 in a variable!
for example,
Id ReviewText Pillow Duvet
1 The Pillows was comfortable 1 0
2 The stake was good 0 0
3 Duvet and pillow was cheap 1 1
I used this code, but I keep receiving "Variable ReviewTextt has been defined as both character and numeric" error!
data Final;
set Final1(rename=(ReviewText=ReviewTextT));
ReviewText = put(ReviewTextT, 9999.);
drop ReviewTextT;
run;
Data Final;
length ReviewText $ 9999;
data final0;
Word = scan (ReviewText,1,'*');
if word in ('pillow' 'pillows') then Pillow=1;
set final;
run;
Thanks in advance for helping!
H
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Best when you get errors is to post the code and messages from log into a code box opened with the forum {I} menu icon.
First reason is so we can see the actual code that was run and second the message windows reformat text. Error messages often show an underscore character or other diagnostics that indicate where the error occurred that get moved by the message windows on this forum.
ReviewText = put(ReviewTextT, 9999.); makes no sense. If the variable contains text then you should get messages about attempting to change text to numeric to use the format. Also the W. format is limited to 32.
For your example data, note the data step to create an actual data set to test code with:
data final1; infile datalines dlm=',' missover; format id 4. reviewtext $30.; input id reviewtext; datalines; 1,The Pillows was comfortable 2,The stake was good 3, Duvet and pillow was cheap ; run; data final; set final1; pillow = index(upcase(ReviewText),'PILLOW')>0; duvet = index(upcase(ReviewText),'DUVET')>0; run;
works for the example provided. The question you have to answer is what do you want done with composite words like Pillowcase?
if you don't want Pillowcase then you would look to FINDW function and use options to ignore case.
Your attempted "scan" approach would require you to loop over the phrase for each word to do the comparison.
index searches the entire variable for the given text and returns a position if found. The upcase is because "pillow" is not the same as "Pillow" or "piLLow" so you need to address the possible case of the word. In this case comparing all upper case. The >0 will return a 1 for true if the text is found and 0 for false if not found.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just look at the last two data steps you ran:
Data Final;
length ReviewText $ 9999;
run;
data final0;
Word = scan (ReviewText,1,'*');
if word in ('pillow' 'pillows') then Pillow=1;
set final;
run;
The first one will make an empty dataset named FINAL with the single character variable named ReviewText.
Then in the second data step you first start using two variables in an assignment statement that you have never defined before. So REVIEWTEXT will be made a number since there is nothing to tell SAS otherwise how to define it. WORD will be defined as character since the SCAN() function always returns character values. It will default to length 12 since the default length that SAS will use to convert the numeric variable to character is BEST12. Then when SAS gets to the SET statement is has a conflict with the definition for REVIEWTEXT.
You could use the FINDW() function.
data have ;
input id text $50. ;
cards;
1 The Pillows was comfortable
2 The stake was good
3 Duvet and pillow was cheap
;
data want ;
set have ;
pillow=findw(text,'pillow',,'spit') or findw(text,'pillows',,'spit');
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for the reply!
I forgot to mention that I have a data set called Final and in the data set final I have many variable and one of them is reviewtext.
I will try the one that you send!
Cheers
H
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
FINDW or INDEXW are what you really want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Best when you get errors is to post the code and messages from log into a code box opened with the forum {I} menu icon.
First reason is so we can see the actual code that was run and second the message windows reformat text. Error messages often show an underscore character or other diagnostics that indicate where the error occurred that get moved by the message windows on this forum.
ReviewText = put(ReviewTextT, 9999.); makes no sense. If the variable contains text then you should get messages about attempting to change text to numeric to use the format. Also the W. format is limited to 32.
For your example data, note the data step to create an actual data set to test code with:
data final1; infile datalines dlm=',' missover; format id 4. reviewtext $30.; input id reviewtext; datalines; 1,The Pillows was comfortable 2,The stake was good 3, Duvet and pillow was cheap ; run; data final; set final1; pillow = index(upcase(ReviewText),'PILLOW')>0; duvet = index(upcase(ReviewText),'DUVET')>0; run;
works for the example provided. The question you have to answer is what do you want done with composite words like Pillowcase?
if you don't want Pillowcase then you would look to FINDW function and use options to ignore case.
Your attempted "scan" approach would require you to loop over the phrase for each word to do the comparison.
index searches the entire variable for the given text and returns a position if found. The upcase is because "pillow" is not the same as "Pillow" or "piLLow" so you need to address the possible case of the word. In this case comparing all upper case. The >0 will return a 1 for true if the text is found and 0 for false if not found.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for the reply!
I will try your codes!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It works!!!! Thanks alot!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reeza,
Could you please help me with this one too,
How about if I want to find two words in a text!
This time I want to find "Soundbar"! but some people write "Sound Bar". with an space between Sound and bar!
data Final_reviews1;
set Final_reviews;
SoundBar = index(upcase(ReviewText),'SOUNDBAR')>0;
run;
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The compress function removes specified characters. The default is blank
characters.
index(upcase(compress(ReviewText),'SOUNDBAR')>0
Alternatively, you can use the FIND function where you can specify a modifier to search with
case insensitivity.
find(compress(ReviewText),'soundbar','i')>0
would locate SOUNDBAR, SOUND BAR, soundbar, sound bar, SoUnd baR, etc.
Find:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002267763.htm
Compress:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm