BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Caetreviop543
Obsidian | Level 7

I'm trying to figure out the best, most succinct way to extract words from a text variable, and place them in a new variable. I have a text variable:

 

TXT

Some text

More text

List text

Read text

 

And I want to extract certain words from TXT and create a new variable:

 

TXT                   Extract

text Some          Some

More text           More

text word List     List

Read text          Read

 

So far this is the only code I came up with:

 

 

data new;
set old;

if find(TXT, "Some")>0 then Extract = "Some";
else if find(TXT, "More")>0 then Extract = "More";
else if find(TXT, "List")>0 then Extract = "List";
else if find(TXT, "Read")>0 the Extract = "Read";

run;

 

It works, but is pretty clunky. Does anyone know  a better way?

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @Caetreviop543  For hard coded char constants, you could consider a fast approach which is merely tweak of your original. See if this helps

 


data have;
input TXT & :$10. ;
cards;

Some text

More text

List text

Read text
text red
;

data want;
 set have;
 array t(5) $32 _temporary_ ('Some' 'More' 'List' 'Read' 'red') ;
 length extract $32;
 do _n_=1 to dim(t) until(not missing(extract));
  if index(txt,strip(t(_n_))) then extract=t(_n_);
 end;
run;

View solution in original post

15 REPLIES 15
novinosrin
Tourmaline | Level 20
Extract=scan(txt,1,' ');
Caetreviop543
Obsidian | Level 7

I made the example too simple; the word I want to extract isn't always the first. 

novinosrin
Tourmaline | Level 20

Oops sorry I overlooked completely. my apologies

Caetreviop543
Obsidian | Level 7

That's ok! I should have made the example better.

novinosrin
Tourmaline | Level 20

HI @Caetreviop543  Would this come close?

 

Basically, eliminating the word 'text' in the string

 

data have;
input TXT & :$10. ;
cards;

Some text

More text

List text

Read text
text red
;

data want;
 set have;
 temp=strip(tranwrd(txt,'text',' '));
run;
Caetreviop543
Obsidian | Level 7

Thanks for your response. Is there a way to extract just the word if the text isn't all the same, i.e., the actual word text doesn't appear throughout? I just used text to symbolize a string of words. 

novinosrin
Tourmaline | Level 20

 

Is there a way to extract just the word if the text isn't all the same,

 

By the above, do you mean you only want to extract a set of known words? Like as though you have a list of words to extract from the string???Please clarify

 

 


@Caetreviop543 wrote:

Thanks for your response. Is there a way to extract just the word if the text isn't all the same, i.e., the actual word text doesn't appear throughout? I just used text to symbolize a string of words. 


 

ChrisNZ
Tourmaline | Level 20

You can write this in a more compact manner, but it is less legible.

EXTRACT = prxchange('s/.*(Some|More|List|Read).*/\1/',-1,TXT);

This means: find anything then one of the strings then anything else and replace with the string found.

 

Caetreviop543
Obsidian | Level 7

It told me the prxchange argument doesn't have enough arguments. How does it know which variable to change/extract words from?

ChrisNZ
Tourmaline | Level 20

Try again, I corrected within seconds . I clicked too fast and you are too fast. 🙂

ChrisNZ
Tourmaline | Level 20

1 is probably fine rather than -1 actually. Sorry I can't test.

Caetreviop543
Obsidian | Level 7

Thanks, but that didn't work. It returned a much larger list of lots of text.

ChrisNZ
Tourmaline | Level 20

Mmm that's odd. I probably forgot something obvious. I can't test sadly. 

novinosrin
Tourmaline | Level 20

Hi @Caetreviop543  For hard coded char constants, you could consider a fast approach which is merely tweak of your original. See if this helps

 


data have;
input TXT & :$10. ;
cards;

Some text

More text

List text

Read text
text red
;

data want;
 set have;
 array t(5) $32 _temporary_ ('Some' 'More' 'List' 'Read' 'red') ;
 length extract $32;
 do _n_=1 to dim(t) until(not missing(extract));
  if index(txt,strip(t(_n_))) then extract=t(_n_);
 end;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 2333 views
  • 0 likes
  • 3 in conversation