Hello everyone;
Does anybody know how to find for example only "ture" in the following string"
"screen shots or pictures of errors with ture and ..."
I tried many available functions, but non of them worked as they need location of the word. But since I'm dealing with over 100 milion observation and the location of the word in variable (column) is random, I can't get any useful results. For example, if I want to use index or Find or Contains, then I get pictures and ture, because pictures also contains ture. If I want to use SUBSTR instead then I need to know the location of the word, which I don't because the word can be found anywhere in the column.
Any help would be appreciated very much.
Thanks!
data want;
str="screen shots or pictures of errors with ture and ...";
position_of_the_word=findw(str,'ture',' ','e');
extract_the_word=scan(str,findw(str,'ture',' ','e'));
run;
str | position_of_the_word | extract_the_word |
---|---|---|
screen shots or pictures of errors with ture and ... | 8 | ture |
Instead of FIND, use FINDW.
data want;
str="screen shots or pictures of errors with ture and ...";
position_of_the_word=findw(str,'ture',' ','e');
extract_the_word=scan(str,findw(str,'ture',' ','e'));
run;
str | position_of_the_word | extract_the_word |
---|---|---|
screen shots or pictures of errors with ture and ... | 8 | ture |
data want;
str="screen shots or pictures of errors with ture and ...";
want=prxchange('s/.*(\bture\b).*/$1/i', -1, str);
run;
Hi;
Thanks for prompt answer. How can I tailor those codes for variable name that contains the text instead of single string? The variable name is Text and again I'm looking for "ture".
Hi @Behrouz The following example should help you follow
/*Let's assume HAVE is your source dataset and STR is the variable name*/
data have;
str="screen shots or pictures of errors with ture and ...";
run;
/*Read the source dataset HAVE and create the new variable*/
data want;
set have;
position_of_the_word=findw(str,'ture',' ','e');/*position of the word in the string*/
extract_the_word=scan(str,findw(str,'ture',' ','e'));/*extract the word in the string*/
run;
Hi,
As I mentioned, the variable (column) name is "Text". Where do you put the variable name (Text) in your codes? and the column has over 100 million records and "screen shots or pictures of errors with ture and ..." is just one single example out of 100 million observations. I just gave this example to show you what am I trying to do. Thanks!
Hello Mam/Sir @ShelleySessoms / @ChrisHemedinger Something I clicked accidentally by mistake made my reply overwrite OP's prev message. I don't know how to undo. My sincere apologies for the bother. Kindly help.
@novinosrin Restored. Careful with those Super User powers!
Thank you Sir. I will be -first and last mistake. My apologies again!
@Behrouz The HAVE is the sample dataset I created to test.
WANT is the dataset that is our required output.
Extract_the_word is the new variable that we create in the WANT dataset by reading the HAVE(your input)
The STR variable is the TEXT variable that has the string variables, so please replace the STR with whatever name your TEXT variable contains.
Hello @Behrouz I get the feeling, you are confused with the know-how of how a datastep read/write process happens. I have edited my previous code replacing str as text.. See if you can copy paste to your real stuff properly
/*Let's assume HAVE is your source dataset and STR is the variable name*/
data have;
text="screen shots or pictures of errors with ture and ...";
run;
/*Read the source dataset HAVE and create the new variable*/
data want;
set have;
position_of_the_word=findw(text,'ture',' ','e');/*position of the word in the string*/
extract_the_word=scan(str,findw(text,'ture',' ','e'));/*extract the word in the string*/
run;
Hi;
I'm quite familiar with data steps. I guess my problem was with the first part of your codes:
data have;
text="screen shots or pictures of errors with ture and ...";
run;
If you run it, you get a dataset with only one variable and one observation. So, I think this part wasn't needed. I also became confused with str as variable name. Actually, I thought it is a function. Anyway, I got what I wanted and I like to thank you with helping me out with this.
Thank you @Behrouz We're all glad. Sorry, I ran into lot of confusion as I failed to explain comprehensively. More so than techniques/concepts I am learning communication skill on the forum.
Also, the HAVE part is basically to create a sample for us to test. The original poster (OP) of the question like you may or may not post a sample and the expected output clearly. This leads us to create one to test.
Okay, going forward, here is an approach to frame a good question-
1. Post a sample of your input dataset (HAVE)
2. Post a sample of your expected output dataset (WANT)
3. Briefly explain the business/transformation logic in simple sentences and we are good to go.
Makes sense? 🙂
No worries. We are all learning everyday from each other. I've been using SAS for a long time, but I didn't even know there is a findw function! I felt kind of naive...But, I guess that's the beauty of life, being surprised 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.