Hi SAS Pro,
I'm having a question that I want to extract "14 days" or "10 days" and any "digit days" texts from sentences. The tricky thing is that the position of the substring could be various.
data have;
input X $80.;
datalines;
I spent 14 days in Boston. I love it.
Exercise twice a day for 10 days until May.
I left there. It was about 7 days ago.
;
run;
What I want is to have
1. a new character variable Y including 14 days, 10 days, and 7 days.
2. a numeric variable Z with the values of 14, 10, and 7.
Any help would be highly appreciated!!
Best regards,
C
What result would you like for this input?
Exercise twice per day for 14 days, then reduce to once per day for 5 days.
Hi @CynthiaWei
Updated solution
My previous attempt worked with your data, this is more robust:
data have;
input X $80.;
datalines;
I spent 14 days in Boston. I love it.
Exercise twice a day for 10 days until May.
I left there. It was about 7 days ago.
I had to wait 7 days
8 days is more than I care to wait
I have waited 12 days 3 times
;
run;
data want;
set have;
length Y $8;
y = prxchange('s/(.*\s*)(\d+\s)(days\s)(.*)/$2$3/',-1,X);
Z = input(scan(Y,1,' '),8.);
run;
Hi @CynthiaWei
SAS PRX functions are great for this kind of work.
data have;
input X $80.;
datalines;
I spent 14 days in Boston. I love it.
Exercise twice a day for 10 days until May.
I left there. It was about 7 days ago.
;
run;
data want;
set have;
length Y $8;
y = prxchange('s/(.*)(\s\d+\s\D+\s) (.*)/$2/',-1,X);
Z = input(scan(Y,1,' '),8.);
run;
You want to find the word "days" somewhere in this text string, and then extract the previous "word" which is probably going to be the number you want.
data want;
set have;
do i=1 to countw(x,' ');
if scan(x,i,' ')='days' then do;
number_of_days=input(scan(x,i-1,' '),4.); /* Input turns this number of days into a numeric value */
output;
end;
end;
drop i;
run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.