I'm trying to figure out the best, most succinct way to extract words from a text variable, and place them in a new variable. I have a text variable:
TXT
Some text
More text
List text
Read text
And I want to extract certain words from TXT and create a new variable:
TXT Extract
text Some Some
More text More
text word List List
Read text Read
So far this is the only code I came up with:
data new; set old; if find(TXT, "Some")>0 then Extract = "Some"; else if find(TXT, "More")>0 then Extract = "More"; else if find(TXT, "List")>0 then Extract = "List"; else if find(TXT, "Read")>0 the Extract = "Read"; run;
It works, but is pretty clunky. Does anyone know a better way?
Hi @Caetreviop543 For hard coded char constants, you could consider a fast approach which is merely tweak of your original. See if this helps
data have;
input TXT & :$10. ;
cards;
Some text
More text
List text
Read text
text red
;
data want;
set have;
array t(5) $32 _temporary_ ('Some' 'More' 'List' 'Read' 'red') ;
length extract $32;
do _n_=1 to dim(t) until(not missing(extract));
if index(txt,strip(t(_n_))) then extract=t(_n_);
end;
run;
Extract=scan(txt,1,' ');
I made the example too simple; the word I want to extract isn't always the first.
Oops sorry I overlooked completely. my apologies
That's ok! I should have made the example better.
HI @Caetreviop543 Would this come close?
Basically, eliminating the word 'text' in the string
data have;
input TXT & :$10. ;
cards;
Some text
More text
List text
Read text
text red
;
data want;
set have;
temp=strip(tranwrd(txt,'text',' '));
run;
Thanks for your response. Is there a way to extract just the word if the text isn't all the same, i.e., the actual word text doesn't appear throughout? I just used text to symbolize a string of words.
Is there a way to extract just the word if the text isn't all the same,
By the above, do you mean you only want to extract a set of known words? Like as though you have a list of words to extract from the string???Please clarify
@Caetreviop543 wrote:
Thanks for your response. Is there a way to extract just the word if the text isn't all the same, i.e., the actual word text doesn't appear throughout? I just used text to symbolize a string of words.
You can write this in a more compact manner, but it is less legible.
EXTRACT = prxchange('s/.*(Some|More|List|Read).*/\1/',-1,TXT);
This means: find anything then one of the strings then anything else and replace with the string found.
It told me the prxchange argument doesn't have enough arguments. How does it know which variable to change/extract words from?
Try again, I corrected within seconds . I clicked too fast and you are too fast. 🙂
1 is probably fine rather than -1 actually. Sorry I can't test.
Thanks, but that didn't work. It returned a much larger list of lots of text.
Mmm that's odd. I probably forgot something obvious. I can't test sadly.
Hi @Caetreviop543 For hard coded char constants, you could consider a fast approach which is merely tweak of your original. See if this helps
data have;
input TXT & :$10. ;
cards;
Some text
More text
List text
Read text
text red
;
data want;
set have;
array t(5) $32 _temporary_ ('Some' 'More' 'List' 'Read' 'red') ;
length extract $32;
do _n_=1 to dim(t) until(not missing(extract));
if index(txt,strip(t(_n_))) then extract=t(_n_);
end;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.