Obsidian | Level 7

## How to extract a substring and generate a new variable with values of that substring

Hi SAS Pro,

I'm having a question that I want to extract "14 days" or "10 days" and any "digit days" texts from sentences. The tricky thing is that the position of the substring could be various.

data have;

input X \$80.;

datalines;

I spent 14 days in Boston. I love it.

Exercise twice a day for 10 days until May.

I left there. It was about 7 days ago.

;

run;

What I want is to have

1. a new character variable Y including 14 days, 10 days, and 7 days.

2. a numeric variable Z with the values of 14, 10, and 7.

Any help would be highly appreciated!!

Best regards,

C

6 REPLIES 6
PROC Star

## Re: How to extract a substring and generate a new variable with values of that substring

What result would you like for this input?

``Exercise twice per day for 14 days, then reduce to once per day for 5 days.``
Rhodochrosite | Level 12

## Re: How to extract a substring and generate a new variable with values of that substring

Updated solution

My previous attempt worked with your data, this is more robust:

``````data have;
input X \$80.;
datalines;
I spent 14 days in Boston. I love it.
Exercise twice a day for 10 days until May.
I left there. It was about 7 days ago.
I had to wait 7 days
8 days is more than I care to wait
I have waited 12 days 3 times
;
run;

data want;
set have;
length Y \$8;
y = prxchange('s/(.*\s*)(\d+\s)(days\s)(.*)/\$2\$3/',-1,X);
Z = input(scan(Y,1,' '),8.);
run;``````
Obsidian | Level 7

## Re: How to extract a substring and generate a new variable with values of that substring

Thank you so much for the code. Just out of curiosity, what do those statements mean within the PRX function?
Obsidian | Level 7

## Re: How to extract a substring and generate a new variable with values of that substring

Thank you for asking this. What are the syntax if a. I only want the first text and b. any text that meets the target text.

Thanks a lot!
Rhodochrosite | Level 12

## Re: How to extract a substring and generate a new variable with values of that substring

SAS PRX functions are great for this kind of work.

``````data have;
input X \$80.;
datalines;
I spent 14 days in Boston. I love it.
Exercise twice a day for 10 days until May.
I left there. It was about 7 days ago.
;
run;

data want;
set have;
length Y \$8;
y = prxchange('s/(.*)(\s\d+\s\D+\s) (.*)/\$2/',-1,X);
Z = input(scan(Y,1,' '),8.);
run;``````

Diamond | Level 26

## Re: How to extract a substring and generate a new variable with values of that substring

You want to find the word "days" somewhere in this text string, and then extract the previous "word" which is probably going to be the number you want.

``````data want;
set have;
do i=1 to countw(x,' ');
if scan(x,i,' ')='days' then do;
number_of_days=input(scan(x,i-1,' '),4.); /* Input turns this number of days into a numeric value */
output;
end;
end;
drop i;
run;``` ```

--
Paige Miller
Discussion stats
• 6 replies
• 684 views
• 0 likes
• 4 in conversation