Hi,
function PRXCHANGE returns the error "The routine PRXCHANGE was called using a regular expression that contains no replacement text" and I can't find the reason. What I want to do is extract the substring identified by the regular expression.
As a test, I get the same error on this example dataset:
data table1;
input name & $32.;
datalines;
44fds
3.3.fdfsd
22.22fdfs
12dsd12
;
data table2;
set table1;
name2 = prxchange('~^[0-9\./]+~', -1, name);
run; /*This gives the error*/
The regular expression is correct, in fact if you run this
data table2;
set table1;
test = PRXMATCH('~^[0-9\./]+~',name);
run;
it runs correctly and the variable test is valorized with 1, so the regex was found.
Thanks in advance,
Luke
@andreas_lds wrote:
@Luke3 wrote:
It doesnt work if the string doesn't start with a number. From the example it wasn't clear, but it's not guaranteed that the string starts with a number, also it can have special characters, so I'm not sure the anyalpha-1 solution would work. In the meanwhile I found this way:
Then, please, post data that contains all possible combinations of digits and letters that could exist and the expected result.
More eterogeneous data:
-----
----string
helloworld
323.43astring
23hello(world*23.34.12)
1223/34anotherstring12.34
1234
12-43
13.34/34
The regular expression to extract is ^[0-9\./]+ (numbers, dots and slashes at the beginning). Expected result:
empty or skip empty or skip empty or skip 323.43 23 1223/34 1234 12 13.34/34
The solution I came up with is:
data table2(DROP = pattern start length);
set table1;
pattern = PRXPARSE('~^[0-9\./]+~');
call prxsubstr(pattern, name, start,length);
IF length>0 THEN name2 = SUBSTR(name, start , length);
run;
The solution proposed by @Ksharp should work if we add a check on the first character of the string:
data table2;
set table1;
IF ISNUMBER(SUBSTR(name,1,1) THEN name2 = prxchange('s/^([\d\.]+).*/\1/',1,name);
run;
Don't know if it's possible to do it with a single call to prxchange
What are you trying to achieve?
I'm trying to extract the substring identified by the regular expression: numbers and dots starting from the beginning.
Variable name2 should contain:
44
3.3.
22.22
12
Do you have to use a regular expression? Are there other patterns that you need? `COMPRESS` can take care of this easily.
data table1;
input name & $32.;
datalines;
44fds
3.3.fdfsd
22.22fdfs
12dsd12
;
data table2;
set table1;
name2 = compress(name,,'kdp');
run;
Again, this only works for your example data. Other patterns may cause issues.
Edit: also noticed that you're not substituting anything in your `PRXCHANGE` call. You have to prepend your regex with a '/s' and provide a replacement, if I am remembering correctly.
Edit 2:
data table2;
set table1;
name2 = compress(name,,'kdp');
name3 = prxchange('s/[A-Za-z+]//', -1, name);
run;
I'm not great with regex's and only use them when I have to.
I need numbers and dots starting from the beginning of the string, not all numbers and dots. So in line 4 it should be only 12, not 1212. Thanks.
For the data posted, this works, too:
data table1;
input name & $32.;
datalines;
44fds
3.3.fdfsd
22.22fdfs
12dsd12
;
data want;
set table1;
length numb $ 10;
numb = substr(name, 1, anyalpha(name) -1);
run;
data table1; input name & $32.; datalines; 44fds 3.3.fdfsd 22.22fdfs 12dsd12 ; data table2; set table1; test = prxchange('s/^([\d\.]+).*/\1/',1,name); run;
It doesnt work if the string doesn't start with a number. From the example it wasn't clear, but it's not guaranteed that the string starts with a number, also it can have special characters, so I'm not sure the anyalpha-1 solution would work. In the meanwhile I found this way:
data table2(DROP = pattern start length);
set table1;
pattern = PRXPARSE('~^[0-9\./]+~');
call prxsubstr(pattern, name, start,length);
IF length>0 THEN name2 = SUBSTR(name, start , length);
run;
@Luke3 wrote:
It doesnt work if the string doesn't start with a number. From the example it wasn't clear, but it's not guaranteed that the string starts with a number, also it can have special characters, so I'm not sure the anyalpha-1 solution would work. In the meanwhile I found this way:
Then, please, post data that contains all possible combinations of digits and letters that could exist and the expected result.
@andreas_lds wrote:
@Luke3 wrote:
It doesnt work if the string doesn't start with a number. From the example it wasn't clear, but it's not guaranteed that the string starts with a number, also it can have special characters, so I'm not sure the anyalpha-1 solution would work. In the meanwhile I found this way:
Then, please, post data that contains all possible combinations of digits and letters that could exist and the expected result.
More eterogeneous data:
-----
----string
helloworld
323.43astring
23hello(world*23.34.12)
1223/34anotherstring12.34
1234
12-43
13.34/34
The regular expression to extract is ^[0-9\./]+ (numbers, dots and slashes at the beginning). Expected result:
empty or skip empty or skip empty or skip 323.43 23 1223/34 1234 12 13.34/34
The solution I came up with is:
data table2(DROP = pattern start length);
set table1;
pattern = PRXPARSE('~^[0-9\./]+~');
call prxsubstr(pattern, name, start,length);
IF length>0 THEN name2 = SUBSTR(name, start , length);
run;
The solution proposed by @Ksharp should work if we add a check on the first character of the string:
data table2;
set table1;
IF ISNUMBER(SUBSTR(name,1,1) THEN name2 = prxchange('s/^([\d\.]+).*/\1/',1,name);
run;
Don't know if it's possible to do it with a single call to prxchange
There is another proposal.
data table1;
input name & $32.;
datalines;
44fds
3.3.fdfsd
22.22fdfs
12dsd12
;
data table2;
set table1;
name2 = prxchange('s/((^[0-9]+\.[0-9]+)|(^[0-9]+)).*/$1/', -1, name);
put name2=;
run;
The problem seems to be that you call the PRXCHANGE routine, and although the PRX expression is syntactically correct, it is not a change expression.
The syntax of a change expression is
<delimiter><expression to look for><delimiter><expression to replace with><delimiter><options>
In your example the delimiter is "~", and the expression to look for is "^[0-9\./]+", but nothing comes after the expression to look for.
If, for instance, you wanted to replace the found expression with an "X", your PRX expression should look like this:
'~^[0-9\./]+~X~'
What is the string you are searching for, and what do you want to replace it with?
@s_lassen wrote:
If, for instance, you wanted to replace the found expression with an "X", your PRX expression should look like this:
'~^[0-9\./]+~X~'
What is the string you are searching for, and what do you want to replace it with?
I want to extract ^[0-9\./]+
from the string, so like saying I want to replace the whole string with that.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.