I have a SAS table with a column data. The data column will contain values like this: XXXX 1123 YYY , SSSSA 1 WWW 3 QQQ, EEE WWS 122, 123 XASS WYSS
I want to remove the numeric terms in the middle. So the desired output looks like this:
XXXX YYY , SSSSA WWW QQQ, EEE WWS, 123 XASS WYSS
I started with this code (to remove the numbers irrespective of the position) but it doesnt give me the answer I need.
data want;
set have;
array word[100] $20 _temporary_;
length result $200;
result=' ';
do i=1 to countw(data, ' ');
word[i]=scan(data,i,' ');
if notdigit(word[i]) then do;
result=catx(' ' , result, word[i]);
end;
end;
run;
Can anyone help me to fix the issue and get the desired results?
Not sure if you want the digit strings or to remove them. This removes them.
data example;
input text $50.;
pattern = prxparse("s/\d+//");
new_text = prxchange(pattern, -1, text);
drop pattern;
datalines;
Here are some numbers: 123, 456, and 789.
This string has no numbers.
A single number: 42.
More numbers: 333, 7777.
;
run;
proc print data=example;
run;
I think this is it.
data example;
infile cards truncover;
input text $100.;
pattern_all_digits = prxparse("s/\d+//");
pattern_leading_digits = prxparse("/^\d+/");
/* Initialize variables */
new_text = text;
/* Check for leading digits and preserve them if found */
if prxmatch(pattern_leading_digits, text) then do;
length leading_digits $20.;
call prxsubstr(pattern_leading_digits, text, position, length);
leading_digits = substr(text, position, length);
new_text = prxchange(pattern_all_digits, -1, substr(text, length+1));
new_text = catx(' ', leading_digits, new_text);
end;
else do;
new_text = prxchange(pattern_all_digits, -1, text);
end;
drop pattern_all_digits pattern_leading_digits position length leading_digits;
datalines;
123 Here are some numbers: 123, 456, and 789.
This string has no numbers.
42 A single number: 42.
More numbers: 333, 7777.
99 Another example 100 200.
;
run;
proc print data=example;
run;
I really appreciate your help. Would it possible to adjust the code to not to remove the digits in these kind of scenarios?
Ex: XXX1222 YYYY, SSS1222 TYYYY 1222
the outputs of these kind of scenarios would be :
XXX1222 YYYY, SSS1222 TYYYY
This may be close.
data example;
infile cards truncover;
input text $100.;
if _n_ eq 1 then do;
/* Pattern to remove digit strings that are not part of a word */
pattern_all_digits = prxparse("s/\b\d+\b//");
/* Pattern to detect digit strings at the beginning of the sentence */
pattern_leading_digits = prxparse("/^\d+/");
retain pattern_:;
end;
/* Initialize variables */
new_text = text;
/* Check for leading digits and preserve them if found */
if prxmatch(pattern_leading_digits, text) then do;
length leading_digits $20.;
call prxsubstr(pattern_leading_digits, text, position, length);
leading_digits = substr(text, position, length);
new_text = prxchange(pattern_all_digits, -1, substr(text, length+1));
new_text = catx(' ', leading_digits, new_text);
end;
else do;
new_text = prxchange(pattern_all_digits, -1, text);
end;
drop pattern_all_digits pattern_leading_digits position length leading_digits;
datalines;
123 Here are some numbers: 123, 456, and 789.
This string has no numbers.
42 A single number: 42.
More numbers: 333, 7777.
99 Another example 100 200.
abc123def Keep digits in words like abc123def.
123abc Also keep digits in words like 123abc.
;
run;
proc print data=example;
run;
@sam88r wrote:
I really appreciate your help. Would it possible to adjust the code to not to remove the digits in these kind of scenarios?
Ex: XXX1222 YYYY, SSS1222 TYYYY 1222
the outputs of these kind of scenarios would be :
XXX1222 YYYY, SSS1222 TYYYY
data example;
infile cards dsd;
input text :$50. @@;
datalines;
XXX1222 YYYY, SSS1222 TYYYY 1222
XXXX 1123 YYY , SSSSA 1 WWW 3 QQQ, EEE WWS 122, 123 XASS WYSS
;
run;
data want;
set example;
want=prxchange('s/\s\d+\b//',-1,text);
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.