BookmarkSubscribeRSS Feed
sam88r
Fluorite | Level 6

I have a SAS table with a column data. The data  column will contain values like this: XXXX 1123 YYY , SSSSA 1 WWW 3 QQQ, EEE WWS 122, 123 XASS WYSS

I want to remove the numeric terms in the middle. So the desired output looks like this:

XXXX YYY , SSSSA WWW QQQ, EEE WWS, 123 XASS WYSS

 

I started with this code (to remove the numbers irrespective of the position) but it doesnt give me the answer I need.

 

data want;

set have;

array word[100] $20 _temporary_;

length result $200;

result=' ';

do i=1 to countw(data, ' ');

  word[i]=scan(data,i,' ');

    if notdigit(word[i]) then do;

     result=catx(' ' , result, word[i]);

   end;

end;

run;

Can anyone help me to fix the issue and get the desired results?

 

5 REPLIES 5
data_null__
Jade | Level 19

Not sure if you want the digit strings or to remove them.  This removes them.

 

data example;
    input text $50.;
    pattern = prxparse("s/\d+//");
    new_text = prxchange(pattern, -1, text);
    drop pattern;
datalines;
Here are some numbers: 123, 456, and 789.
This string has no numbers.
A single number: 42.
More numbers: 333, 7777.
;
run;

proc print data=example;
run;

data_null___0-1722631865971.png

 

data_null__
Jade | Level 19

I think this is it.

 

data example;
    infile cards truncover;
    input text $100.;
    pattern_all_digits = prxparse("s/\d+//");
    pattern_leading_digits = prxparse("/^\d+/");

    /* Initialize variables */
    new_text = text;

    /* Check for leading digits and preserve them if found */
    if prxmatch(pattern_leading_digits, text) then do;
        length leading_digits $20.;
        call prxsubstr(pattern_leading_digits, text, position, length);
        leading_digits = substr(text, position, length);
        new_text = prxchange(pattern_all_digits, -1, substr(text, length+1));
        new_text = catx(' ', leading_digits, new_text);
    end;
    else do;
        new_text = prxchange(pattern_all_digits, -1, text);
    end;

    drop pattern_all_digits pattern_leading_digits position length leading_digits;
datalines;
123 Here are some numbers: 123, 456, and 789.
This string has no numbers.
42 A single number: 42.
More numbers: 333, 7777.
99 Another example 100 200.
;
run;

proc print data=example;
run;

data_null___0-1722632474522.png

 

sam88r
Fluorite | Level 6

I really appreciate your help.  Would it possible to adjust the code to not to remove the digits in these kind of scenarios? 

 

Ex: XXX1222 YYYY, SSS1222 TYYYY 1222

 

the outputs of these kind of scenarios would be :

XXX1222 YYYY, SSS1222 TYYYY

 

data_null__
Jade | Level 19

This may be close.

data example;
    infile cards truncover;
    input text $100.;
    if _n_ eq 1 then do;
       /* Pattern to remove digit strings that are not part of a word */
       pattern_all_digits = prxparse("s/\b\d+\b//");
       /* Pattern to detect digit strings at the beginning of the sentence */
       pattern_leading_digits = prxparse("/^\d+/");
       retain pattern_:;
       end;    
    /* Initialize variables */
    new_text = text;

    /* Check for leading digits and preserve them if found */
    if prxmatch(pattern_leading_digits, text) then do;
        length leading_digits $20.;
        call prxsubstr(pattern_leading_digits, text, position, length);
        leading_digits = substr(text, position, length);
        new_text = prxchange(pattern_all_digits, -1, substr(text, length+1));
        new_text = catx(' ', leading_digits, new_text);
    end;
    else do;
        new_text = prxchange(pattern_all_digits, -1, text);
    end;

    drop pattern_all_digits pattern_leading_digits position length leading_digits;
datalines;
123 Here are some numbers: 123, 456, and 789.
This string has no numbers.
42 A single number: 42.
More numbers: 333, 7777.
99 Another example 100 200.
abc123def Keep digits in words like abc123def.
123abc Also keep digits in words like 123abc.
;
run;

proc print data=example;
run;

Capture.PNG

 


@sam88r wrote:

I really appreciate your help.  Would it possible to adjust the code to not to remove the digits in these kind of scenarios? 

 

Ex: XXX1222 YYYY, SSS1222 TYYYY 1222

 

the outputs of these kind of scenarios would be :

XXX1222 YYYY, SSS1222 TYYYY

 


 

Ksharp
Super User
data example;
infile cards dsd;
    input text :$50. @@;
datalines;
XXX1222 YYYY, SSS1222 TYYYY 1222
XXXX 1123 YYY , SSSSA 1 WWW 3 QQQ, EEE WWS 122, 123 XASS WYSS
;
run;
data want;
 set example;
 want=prxchange('s/\s\d+\b//',-1,text);
run;

Ksharp_0-1722670014654.png

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 843 views
  • 3 likes
  • 3 in conversation