- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to figure out the best, most succinct way to extract words from a text variable, and place them in a new variable. I have a text variable:
TXT
Some text
More text
List text
Read text
And I want to extract certain words from TXT and create a new variable:
TXT Extract
text Some Some
More text More
text word List List
Read text Read
So far this is the only code I came up with:
data new; set old; if find(TXT, "Some")>0 then Extract = "Some"; else if find(TXT, "More")>0 then Extract = "More"; else if find(TXT, "List")>0 then Extract = "List"; else if find(TXT, "Read")>0 the Extract = "Read"; run;
It works, but is pretty clunky. Does anyone know a better way?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Caetreviop543 For hard coded char constants, you could consider a fast approach which is merely tweak of your original. See if this helps
data have;
input TXT & :$10. ;
cards;
Some text
More text
List text
Read text
text red
;
data want;
set have;
array t(5) $32 _temporary_ ('Some' 'More' 'List' 'Read' 'red') ;
length extract $32;
do _n_=1 to dim(t) until(not missing(extract));
if index(txt,strip(t(_n_))) then extract=t(_n_);
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Extract=scan(txt,1,' ');
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I made the example too simple; the word I want to extract isn't always the first.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Oops sorry I overlooked completely. my apologies
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's ok! I should have made the example better.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
HI @Caetreviop543 Would this come close?
Basically, eliminating the word 'text' in the string
data have;
input TXT & :$10. ;
cards;
Some text
More text
List text
Read text
text red
;
data want;
set have;
temp=strip(tranwrd(txt,'text',' '));
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response. Is there a way to extract just the word if the text isn't all the same, i.e., the actual word text doesn't appear throughout? I just used text to symbolize a string of words.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way to extract just the word if the text isn't all the same,
By the above, do you mean you only want to extract a set of known words? Like as though you have a list of words to extract from the string???Please clarify
@Caetreviop543 wrote:
Thanks for your response. Is there a way to extract just the word if the text isn't all the same, i.e., the actual word text doesn't appear throughout? I just used text to symbolize a string of words.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can write this in a more compact manner, but it is less legible.
EXTRACT = prxchange('s/.*(Some|More|List|Read).*/\1/',-1,TXT);
This means: find anything then one of the strings then anything else and replace with the string found.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It told me the prxchange argument doesn't have enough arguments. How does it know which variable to change/extract words from?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try again, I corrected within seconds . I clicked too fast and you are too fast. 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1 is probably fine rather than -1 actually. Sorry I can't test.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, but that didn't work. It returned a much larger list of lots of text.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Mmm that's odd. I probably forgot something obvious. I can't test sadly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Caetreviop543 For hard coded char constants, you could consider a fast approach which is merely tweak of your original. See if this helps
data have;
input TXT & :$10. ;
cards;
Some text
More text
List text
Read text
text red
;
data want;
set have;
array t(5) $32 _temporary_ ('Some' 'More' 'List' 'Read' 'red') ;
length extract $32;
do _n_=1 to dim(t) until(not missing(extract));
if index(txt,strip(t(_n_))) then extract=t(_n_);
end;
run;