BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Caetreviop543
Obsidian | Level 7

I'm trying to figure out the best, most succinct way to extract words from a text variable, and place them in a new variable. I have a text variable:

 

TXT

Some text

More text

List text

Read text

 

And I want to extract certain words from TXT and create a new variable:

 

TXT                   Extract

text Some          Some

More text           More

text word List     List

Read text          Read

 

So far this is the only code I came up with:

 

 

data new;
set old;

if find(TXT, "Some")>0 then Extract = "Some";
else if find(TXT, "More")>0 then Extract = "More";
else if find(TXT, "List")>0 then Extract = "List";
else if find(TXT, "Read")>0 the Extract = "Read";

run;

 

It works, but is pretty clunky. Does anyone know  a better way?

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @Caetreviop543  For hard coded char constants, you could consider a fast approach which is merely tweak of your original. See if this helps

 


data have;
input TXT & :$10. ;
cards;

Some text

More text

List text

Read text
text red
;

data want;
 set have;
 array t(5) $32 _temporary_ ('Some' 'More' 'List' 'Read' 'red') ;
 length extract $32;
 do _n_=1 to dim(t) until(not missing(extract));
  if index(txt,strip(t(_n_))) then extract=t(_n_);
 end;
run;

View solution in original post

15 REPLIES 15
novinosrin
Tourmaline | Level 20
Extract=scan(txt,1,' ');
Caetreviop543
Obsidian | Level 7

I made the example too simple; the word I want to extract isn't always the first. 

novinosrin
Tourmaline | Level 20

Oops sorry I overlooked completely. my apologies

Caetreviop543
Obsidian | Level 7

That's ok! I should have made the example better.

novinosrin
Tourmaline | Level 20

HI @Caetreviop543  Would this come close?

 

Basically, eliminating the word 'text' in the string

 

data have;
input TXT & :$10. ;
cards;

Some text

More text

List text

Read text
text red
;

data want;
 set have;
 temp=strip(tranwrd(txt,'text',' '));
run;
Caetreviop543
Obsidian | Level 7

Thanks for your response. Is there a way to extract just the word if the text isn't all the same, i.e., the actual word text doesn't appear throughout? I just used text to symbolize a string of words. 

novinosrin
Tourmaline | Level 20

 

Is there a way to extract just the word if the text isn't all the same,

 

By the above, do you mean you only want to extract a set of known words? Like as though you have a list of words to extract from the string???Please clarify

 

 


@Caetreviop543 wrote:

Thanks for your response. Is there a way to extract just the word if the text isn't all the same, i.e., the actual word text doesn't appear throughout? I just used text to symbolize a string of words. 


 

ChrisNZ
Tourmaline | Level 20

You can write this in a more compact manner, but it is less legible.

EXTRACT = prxchange('s/.*(Some|More|List|Read).*/\1/',-1,TXT);

This means: find anything then one of the strings then anything else and replace with the string found.

 

Caetreviop543
Obsidian | Level 7

It told me the prxchange argument doesn't have enough arguments. How does it know which variable to change/extract words from?

ChrisNZ
Tourmaline | Level 20

Try again, I corrected within seconds . I clicked too fast and you are too fast. 🙂

ChrisNZ
Tourmaline | Level 20

1 is probably fine rather than -1 actually. Sorry I can't test.

Caetreviop543
Obsidian | Level 7

Thanks, but that didn't work. It returned a much larger list of lots of text.

ChrisNZ
Tourmaline | Level 20

Mmm that's odd. I probably forgot something obvious. I can't test sadly. 

novinosrin
Tourmaline | Level 20

Hi @Caetreviop543  For hard coded char constants, you could consider a fast approach which is merely tweak of your original. See if this helps

 


data have;
input TXT & :$10. ;
cards;

Some text

More text

List text

Read text
text red
;

data want;
 set have;
 array t(5) $32 _temporary_ ('Some' 'More' 'List' 'Read' 'red') ;
 length extract $32;
 do _n_=1 to dim(t) until(not missing(extract));
  if index(txt,strip(t(_n_))) then extract=t(_n_);
 end;
run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 2442 views
  • 0 likes
  • 3 in conversation