BookmarkSubscribeRSS Feed
sam88r
Fluorite | Level 6

I have a SAS table with a column data. The data  column will contain values like this: XXXX 1123 YYY , SSSSA 1 WWW 3 QQQ, EEE WWS 122, 123 XASS WYSS

I want to remove the numeric terms in the middle. So the desired output looks like this:

XXXX YYY , SSSSA WWW QQQ, EEE WWS, 123 XASS WYSS

 

I started with this code (to remove the numbers irrespective of the position) but it doesnt give me the answer I need.

 

data want;

set have;

array word[100] $20 _temporary_;

length result $200;

result=' ';

do i=1 to countw(data, ' ');

  word[i]=scan(data,i,' ');

    if notdigit(word[i]) then do;

     result=catx(' ' , result, word[i]);

   end;

end;

run;

Can anyone help me to fix the issue and get the desired results?

 

5 REPLIES 5
data_null__
Jade | Level 19

Not sure if you want the digit strings or to remove them.  This removes them.

 

data example;
    input text $50.;
    pattern = prxparse("s/\d+//");
    new_text = prxchange(pattern, -1, text);
    drop pattern;
datalines;
Here are some numbers: 123, 456, and 789.
This string has no numbers.
A single number: 42.
More numbers: 333, 7777.
;
run;

proc print data=example;
run;

data_null___0-1722631865971.png

 

data_null__
Jade | Level 19

I think this is it.

 

data example;
    infile cards truncover;
    input text $100.;
    pattern_all_digits = prxparse("s/\d+//");
    pattern_leading_digits = prxparse("/^\d+/");

    /* Initialize variables */
    new_text = text;

    /* Check for leading digits and preserve them if found */
    if prxmatch(pattern_leading_digits, text) then do;
        length leading_digits $20.;
        call prxsubstr(pattern_leading_digits, text, position, length);
        leading_digits = substr(text, position, length);
        new_text = prxchange(pattern_all_digits, -1, substr(text, length+1));
        new_text = catx(' ', leading_digits, new_text);
    end;
    else do;
        new_text = prxchange(pattern_all_digits, -1, text);
    end;

    drop pattern_all_digits pattern_leading_digits position length leading_digits;
datalines;
123 Here are some numbers: 123, 456, and 789.
This string has no numbers.
42 A single number: 42.
More numbers: 333, 7777.
99 Another example 100 200.
;
run;

proc print data=example;
run;

data_null___0-1722632474522.png

 

sam88r
Fluorite | Level 6

I really appreciate your help.  Would it possible to adjust the code to not to remove the digits in these kind of scenarios? 

 

Ex: XXX1222 YYYY, SSS1222 TYYYY 1222

 

the outputs of these kind of scenarios would be :

XXX1222 YYYY, SSS1222 TYYYY

 

data_null__
Jade | Level 19

This may be close.

data example;
    infile cards truncover;
    input text $100.;
    if _n_ eq 1 then do;
       /* Pattern to remove digit strings that are not part of a word */
       pattern_all_digits = prxparse("s/\b\d+\b//");
       /* Pattern to detect digit strings at the beginning of the sentence */
       pattern_leading_digits = prxparse("/^\d+/");
       retain pattern_:;
       end;    
    /* Initialize variables */
    new_text = text;

    /* Check for leading digits and preserve them if found */
    if prxmatch(pattern_leading_digits, text) then do;
        length leading_digits $20.;
        call prxsubstr(pattern_leading_digits, text, position, length);
        leading_digits = substr(text, position, length);
        new_text = prxchange(pattern_all_digits, -1, substr(text, length+1));
        new_text = catx(' ', leading_digits, new_text);
    end;
    else do;
        new_text = prxchange(pattern_all_digits, -1, text);
    end;

    drop pattern_all_digits pattern_leading_digits position length leading_digits;
datalines;
123 Here are some numbers: 123, 456, and 789.
This string has no numbers.
42 A single number: 42.
More numbers: 333, 7777.
99 Another example 100 200.
abc123def Keep digits in words like abc123def.
123abc Also keep digits in words like 123abc.
;
run;

proc print data=example;
run;

Capture.PNG

 


@sam88r wrote:

I really appreciate your help.  Would it possible to adjust the code to not to remove the digits in these kind of scenarios? 

 

Ex: XXX1222 YYYY, SSS1222 TYYYY 1222

 

the outputs of these kind of scenarios would be :

XXX1222 YYYY, SSS1222 TYYYY

 


 

Ksharp
Super User
data example;
infile cards dsd;
    input text :$50. @@;
datalines;
XXX1222 YYYY, SSS1222 TYYYY 1222
XXXX 1123 YYY , SSSSA 1 WWW 3 QQQ, EEE WWS 122, 123 XASS WYSS
;
run;
data want;
 set example;
 want=prxchange('s/\s\d+\b//',-1,text);
run;

Ksharp_0-1722670014654.png

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 406 views
  • 3 likes
  • 3 in conversation