Dear all.
I have a String variable in a Dataset with manually entered text. I try to split the text into chunks, while the delimiter consists of more than one character and the can be one or more expressions of this kind. Example:
"0 - Nothing entered or found. 1 - One or more expression- or else - entered. 99 - Missing"
Should become
Field1 Field2
0 Nothing entered or found.
1 One or more expression- or else - entered
99 Missing
I am looking for the 0 - Part using regular expressions (\d+ ?)-
Now I'm search for a possibility to not only find the first occurence of the expressions. Any help is appreciated.
Best Regards
Hi,
You could use scan:
data have;
a="0 - Nothing entered or found. 1 - One or more expression- or else - entered. 99 - Missing";
run;
data want;
set have;
i=1;
do while (scan(a,i,".") ne "");
word=scan(a,i,".");
output;
i=i+1;
end;
run;
Hi RW9.
Sorry, but no. The dot at the end of each sentence is not guaranteed. That's why I think about PRX. I have to look for the expression ([0-9] - ) as delimiter.
Thanks
I found one way:
Using PRXCHANGE('s/(d+ ?)-/$1=/', -1, mytextstring); I change the expression (0 - ) into (0 =). Now I can use SCAN Function.
Assuming you do not have numbers embedded in the strings (like in your example) you can use the following:
(\d+) - ([^\d]+)
| Match 1: | 0 - Nothing entered or found. | 0 | 30 |
| Group 1: | 0 | 0 | 1 |
| Group 2: | Nothing entered or found. | 4 | 26 |
| Match 2: | 1 - One or more expression- or else - entered. | 30 | 47 |
| Group 1: | 1 | 30 | 1 |
| Group 2: | One or more expression- or else - entered. | 34 | 43 |
| Match 3: | 99 - Missing | 77 | 12 |
| Group 1: | 99 | 77 | 2 |
| Group 2: | Missing | 82 | 7 |
You also could use SCAN(). But I would like to use Perl Regular Expression. Check CALL PRXNEXT().
data have;
a="0 - Nothing entered or found. 1 - One or more expression- or else - entered. 99 - Missing";
run;
data want;
set have;
length var $ 100;
do i=1 to countw(a,,'d');
var=catx(' ',scan(a,i,,'kd'),scan(a,i,,'d'));output;
end;
run;
Xia Keshan
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.