BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Hassankalantari
Fluorite | Level 6

Hi everyone!

 

Could you please help me with this matter!

 

I want to find a word within a text, if that word exist in the text, put 1 in a variable!

for example, 

 

Id                ReviewText                      Pillow       Duvet

1         The Pillows was comfortable          1             0               

2          The stake was good                    0              0

3          Duvet and pillow was cheap       1               1

 

I used this code, but I keep receiving  "Variable ReviewTextt has been defined as both character and numeric" error!

 

 

data Final;
set Final1(rename=(ReviewText=ReviewTextT));
ReviewText = put(ReviewTextT, 9999.);
drop ReviewTextT;
run;
Data Final;
length ReviewText $ 9999;


data final0;
Word = scan (ReviewText,1,'*');
if word in ('pillow' 'pillows') then Pillow=1;
set final;
run;

 

Thanks in advance for helping!

H

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Best when you get errors is to post the code and messages from log into a code box opened  with the forum {I} menu icon.

First reason is so we can see the actual code that was run and second the message windows reformat text. Error messages often show an underscore character or other diagnostics that indicate where the error occurred that get moved by the message windows on this forum.

 

ReviewText = put(ReviewTextT, 9999.); makes no sense. If the variable contains text then you should get messages about attempting to change text to numeric to use the format. Also the W. format is limited to 32.

 

For your example data, note the data step to create an actual data set to test code with:

data final1;
   infile datalines dlm=',' missover;
   format id 4. reviewtext $30.;
   input id reviewtext;
datalines;
1,The Pillows was comfortable 
2,The stake was good 
3, Duvet and pillow was cheap 
;
run;

data final;
   set final1;
   pillow = index(upcase(ReviewText),'PILLOW')>0;
   duvet = index(upcase(ReviewText),'DUVET')>0;
run;

works for the example provided. The question you have to answer is what do you want done with composite words like Pillowcase?

 

if you don't want Pillowcase then you would look to FINDW function and use options to ignore case.

 

Your attempted "scan" approach would require you to loop over the phrase for each word to do the comparison.

 

index searches the entire variable for the given text and returns a position if found. The upcase is because "pillow" is not the same as "Pillow" or "piLLow" so you need to address the possible case of the word. In this case comparing all upper case. The >0 will return a 1 for true if the text is found and 0 for false if not found.

View solution in original post

9 REPLIES 9
Tom
Super User Tom
Super User

Just look at the last two data steps you ran:

Data Final;
  length ReviewText $ 9999;
run;

data final0;
  Word = scan (ReviewText,1,'*');
  if word in ('pillow' 'pillows') then Pillow=1;
  set final;
run;

The first one will make an empty dataset named FINAL with the single character variable named ReviewText. 

Then in the second data step you first start using two variables in an assignment statement that you have never defined before.  So REVIEWTEXT will be made a number since there is nothing to tell SAS otherwise how to define it.  WORD will be defined as character since the SCAN() function always returns character values.  It will default to length 12 since the default length that SAS will use to convert the numeric variable to character is BEST12.  Then when SAS gets to the SET statement is has a conflict with the definition for REVIEWTEXT.

 

You could use the FINDW() function.

data have ;
  input id text $50. ;
cards;
1 The Pillows was comfortable
2 The stake was good
3 Duvet and pillow was cheap
;
data want ;
  set have ;
  pillow=findw(text,'pillow',,'spit') or findw(text,'pillows',,'spit');
run;

 

 

Hassankalantari
Fluorite | Level 6

Thank you so much for the reply!

I forgot to mention that I have a data set called Final  and in the data set final I have many variable and one of them is reviewtext. 

 

 

I will try the one that you send!

Cheers

H

Reeza
Super User

FINDW or INDEXW are what you really want.

Hassankalantari
Fluorite | Level 6
Thanks!:) I will also try these! Hopefully it works for me!:)
ballardw
Super User

Best when you get errors is to post the code and messages from log into a code box opened  with the forum {I} menu icon.

First reason is so we can see the actual code that was run and second the message windows reformat text. Error messages often show an underscore character or other diagnostics that indicate where the error occurred that get moved by the message windows on this forum.

 

ReviewText = put(ReviewTextT, 9999.); makes no sense. If the variable contains text then you should get messages about attempting to change text to numeric to use the format. Also the W. format is limited to 32.

 

For your example data, note the data step to create an actual data set to test code with:

data final1;
   infile datalines dlm=',' missover;
   format id 4. reviewtext $30.;
   input id reviewtext;
datalines;
1,The Pillows was comfortable 
2,The stake was good 
3, Duvet and pillow was cheap 
;
run;

data final;
   set final1;
   pillow = index(upcase(ReviewText),'PILLOW')>0;
   duvet = index(upcase(ReviewText),'DUVET')>0;
run;

works for the example provided. The question you have to answer is what do you want done with composite words like Pillowcase?

 

if you don't want Pillowcase then you would look to FINDW function and use options to ignore case.

 

Your attempted "scan" approach would require you to loop over the phrase for each word to do the comparison.

 

index searches the entire variable for the given text and returns a position if found. The upcase is because "pillow" is not the same as "Pillow" or "piLLow" so you need to address the possible case of the word. In this case comparing all upper case. The >0 will return a 1 for true if the text is found and 0 for false if not found.

Hassankalantari
Fluorite | Level 6

Thank you so much for the reply!

I will try your codes!

 

 

Hassankalantari
Fluorite | Level 6

It works!!!! Thanks alot!

Hassankalantari
Fluorite | Level 6

Hi Reeza,

 

Could you please help me with this one too, 

 

How about if I want to find two words in a text!

This time I want to find "Soundbar"! but some people write "Sound Bar". with an space between Sound and bar!

 

data Final_reviews1;
set Final_reviews;
SoundBar = index(upcase(ReviewText),'SOUNDBAR')>0;
run;

 

Thanks in advance

JoshB
Quartz | Level 8

The compress function removes specified characters. The default is blank
characters.

index(upcase(compress(ReviewText),'SOUNDBAR')>0



Alternatively, you can use the FIND function where you can specify a modifier to search with
case insensitivity.

find(compress(ReviewText),'soundbar','i')>0



would locate SOUNDBAR, SOUND BAR, soundbar, sound bar, SoUnd baR, etc.


Find:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002267763.htm
Compress:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 9014 views
  • 1 like
  • 5 in conversation