BookmarkSubscribeRSS Feed
sam88r
Fluorite | Level 6

I have a SAS table with a column data. The data  column will contain values like this: XXXX 1123 YYY , SSSSA 1 WWW 3 QQQ, EEE WWS 122, 123 XASS WYSS

I want to remove the numeric terms in the middle. So the desired output looks like this:

XXXX YYY , SSSSA WWW QQQ, EEE WWS, 123 XASS WYSS

 

I started with this code (to remove the numbers irrespective of the position) but it doesnt give me the answer I need.

 

data want;

set have;

array word[100] $20 _temporary_;

length result $200;

result=' ';

do i=1 to countw(data, ' ');

  word[i]=scan(data,i,' ');

    if notdigit(word[i]) then do;

     result=catx(' ' , result, word[i]);

   end;

end;

run;

Can anyone help me to fix the issue and get the desired results?

 

5 REPLIES 5
data_null__
Jade | Level 19

Not sure if you want the digit strings or to remove them.  This removes them.

 

data example;
    input text $50.;
    pattern = prxparse("s/\d+//");
    new_text = prxchange(pattern, -1, text);
    drop pattern;
datalines;
Here are some numbers: 123, 456, and 789.
This string has no numbers.
A single number: 42.
More numbers: 333, 7777.
;
run;

proc print data=example;
run;

data_null___0-1722631865971.png

 

data_null__
Jade | Level 19

I think this is it.

 

data example;
    infile cards truncover;
    input text $100.;
    pattern_all_digits = prxparse("s/\d+//");
    pattern_leading_digits = prxparse("/^\d+/");

    /* Initialize variables */
    new_text = text;

    /* Check for leading digits and preserve them if found */
    if prxmatch(pattern_leading_digits, text) then do;
        length leading_digits $20.;
        call prxsubstr(pattern_leading_digits, text, position, length);
        leading_digits = substr(text, position, length);
        new_text = prxchange(pattern_all_digits, -1, substr(text, length+1));
        new_text = catx(' ', leading_digits, new_text);
    end;
    else do;
        new_text = prxchange(pattern_all_digits, -1, text);
    end;

    drop pattern_all_digits pattern_leading_digits position length leading_digits;
datalines;
123 Here are some numbers: 123, 456, and 789.
This string has no numbers.
42 A single number: 42.
More numbers: 333, 7777.
99 Another example 100 200.
;
run;

proc print data=example;
run;

data_null___0-1722632474522.png

 

sam88r
Fluorite | Level 6

I really appreciate your help.  Would it possible to adjust the code to not to remove the digits in these kind of scenarios? 

 

Ex: XXX1222 YYYY, SSS1222 TYYYY 1222

 

the outputs of these kind of scenarios would be :

XXX1222 YYYY, SSS1222 TYYYY

 

data_null__
Jade | Level 19

This may be close.

data example;
    infile cards truncover;
    input text $100.;
    if _n_ eq 1 then do;
       /* Pattern to remove digit strings that are not part of a word */
       pattern_all_digits = prxparse("s/\b\d+\b//");
       /* Pattern to detect digit strings at the beginning of the sentence */
       pattern_leading_digits = prxparse("/^\d+/");
       retain pattern_:;
       end;    
    /* Initialize variables */
    new_text = text;

    /* Check for leading digits and preserve them if found */
    if prxmatch(pattern_leading_digits, text) then do;
        length leading_digits $20.;
        call prxsubstr(pattern_leading_digits, text, position, length);
        leading_digits = substr(text, position, length);
        new_text = prxchange(pattern_all_digits, -1, substr(text, length+1));
        new_text = catx(' ', leading_digits, new_text);
    end;
    else do;
        new_text = prxchange(pattern_all_digits, -1, text);
    end;

    drop pattern_all_digits pattern_leading_digits position length leading_digits;
datalines;
123 Here are some numbers: 123, 456, and 789.
This string has no numbers.
42 A single number: 42.
More numbers: 333, 7777.
99 Another example 100 200.
abc123def Keep digits in words like abc123def.
123abc Also keep digits in words like 123abc.
;
run;

proc print data=example;
run;

Capture.PNG

 


@sam88r wrote:

I really appreciate your help.  Would it possible to adjust the code to not to remove the digits in these kind of scenarios? 

 

Ex: XXX1222 YYYY, SSS1222 TYYYY 1222

 

the outputs of these kind of scenarios would be :

XXX1222 YYYY, SSS1222 TYYYY

 


 

Ksharp
Super User
data example;
infile cards dsd;
    input text :$50. @@;
datalines;
XXX1222 YYYY, SSS1222 TYYYY 1222
XXXX 1123 YYY , SSSSA 1 WWW 3 QQQ, EEE WWS 122, 123 XASS WYSS
;
run;
data want;
 set example;
 want=prxchange('s/\s\d+\b//',-1,text);
run;

Ksharp_0-1722670014654.png

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 751 views
  • 3 likes
  • 3 in conversation