Help using Base SAS procedures

To extract Alpha/numerical from a sentence

Accepted Solution Solved
Reply
Regular Contributor
Posts: 240
Accepted Solution

To extract Alpha/numerical from a sentence

HI,

i need to extract data that will be 5 characters example TG0001 which is machine #  . I'm currently dealing with over 5k machines which have a unique number . I tried compress an it just provided me the # not the letters.

the column something like

column

blah blah TG0001

blah DF0001

blah blah FG0001 blah blah

blah GH 0001

In some cases there might be a space between number an alpha . Not pretty the data is manually input

thank you for your


Accepted Solutions
Solution
‎06-18-2014 10:28 AM
Respected Advisor
Posts: 3,156

Re: To extract Alpha/numerical from a sentence

Sorry, my bad.

NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,CUST_NAME);

double check the variable name (not the label). If it does have blanks embedded like you said, then you will need literal reference:

  NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,'CUST NAME'n); 

View solution in original post


All Replies
Super Contributor
Posts: 307

Re: To extract Alpha/numerical from a sentence

Hard to envision how to possibly tackle this without more concrete examples of text lines you are dealing with. Any chance you can provide a sample, with actual text values (of course, changing anything that might be security sensitive)?

Respected Advisor
Posts: 3,156

Re: To extract Alpha/numerical from a sentence

If you don't want keep any non-digits at all, and there is no other digits except this one location, then COMPRESS() will be my first option:

DATA HAVE;

     INPUT VAR $ 40.;

     NUM=COMPRESS(VAR,,'KD');

     CARDS;

blah blah TG0001

blah DF0001

blah blah FG0001 blah blah

blah GH 0001

;


Regards,

Haikuo

Trusted Advisor
Posts: 1,228

Re: To extract Alpha/numerical from a sentence

Based on the specified pattern this may produce the desired output.

data have;

input col $ 30.;

datalines;

blah blah TG2001

blah DF0001

blah blah FG0001 blah blah

blah GH 0004

;

data want;

length machine_number $8.;

set have;

machine_number=catt(substr(col,find(col,compress(col,,'kd'))-3,3),compress(col,,'kd'));

run;

Regular Contributor
Posts: 240

Re: To extract Alpha/numerical from a sentence

HI stat@sas  your snippet of code work pretty good gave me about 80 % which is acceptable ..the issue that I have run across is that the output is coming in with spaces . I did a compress but it removed the 2 character Id and just gave me the numical. Any thoughts how to remove blanks an keep the output intact? Thanks again

Respected Advisor
Posts: 3,156

Re: To extract Alpha/numerical from a sentence

Still the PRX functions will offer a succinct solution, if you want to get rid of the blanks:

DATA HAVE;

INPUT VAR $ 40.;

/*     NUM=COMPRESS(VAR,,'KD');*/

      NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,VAR);

CARDS;

blah blah TG0001

blah DF0001

blah blah FG0001 blah blah

blah GH 0001

;

You will have to show some sample data that is not working. People here can only provide solutions based on your sample data, we don't know your data, you do.

Good luck,

Haikuo

Regular Contributor
Posts: 240

Re: To extract Alpha/numerical from a sentence

Hai Kuo

i will be refer a table name info where do I make refer to it in that syntax ? The column name is cust name where do I make that refer ..thanks

Respected Advisor
Posts: 3,156

Re: To extract Alpha/numerical from a sentence

DATA want;

set YOURDATA;

  NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,VAR);

RUN;

Regular Contributor
Posts: 240

Re: To extract Alpha/numerical from a sentence

HI,

your data is name of table ...where do I make refer to column the name of column is "CUST NAME"

thanks

Solution
‎06-18-2014 10:28 AM
Respected Advisor
Posts: 3,156

Re: To extract Alpha/numerical from a sentence

Sorry, my bad.

NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,CUST_NAME);

double check the variable name (not the label). If it does have blanks embedded like you said, then you will need literal reference:

  NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,'CUST NAME'n); 

Regular Contributor
Posts: 240

Re: To extract Alpha/numerical from a sentence

HEre is the error I'm getting

DATA Info;

822  set info

823  /*     NUM=COMPRESS(VAR,,'KD');*/

824     NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,'CUST_NAME'n);

              -

              22

              76

ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, -, :, ;,

              END, INDSNAME, KEY, KEYS, NOBS, OPEN, POINT, _DATA_, _LAST_, _NULL_.

ERROR 76-322: Syntax error, statement will be ignored.

825

826  ;

827  run;

Respected Advisor
Posts: 3,156

Re: To extract Alpha/numerical from a sentence

you are missing a ";' after set info.

Regular Contributor
Posts: 240

Re: To extract Alpha/numerical from a sentence

Never mind I fig I was missing a ; it works thanks

Trusted Advisor
Posts: 1,228

Re: To extract Alpha/numerical from a sentence

Hi,

This is based on what you provided. If you can provide some extract from 20% where this is not working will help to understand your problem in a better way.

Thanks,

Super Contributor
Posts: 297

Re: To extract Alpha/numerical from a sentence

Hi BETO,

It is difficult to produce a solution that works without knowing the exact structure of the data in which it is contained.

Here is an approach using PERL regular expressions.

DATA HAVE;

INPUT COL $ 30.;

  IF _N_ = 1 THEN

  DO;

       RETAIN PATTERN1;

       PATTERN1 = PRXPARSE("/(\w{2}\d{4})|(\w{2}\s\d{4})/");

  END;

  CALL PRXSUBSTR(PATTERN1,COL,START,LENGTH);

  COL1 = COMPRESS(SUBSTR(COL,START,LENGTH));

DATALINES;

BLAH BLAH TG2001

BLAH DF0001

BLAH BLAH FG0001 BLAH BLAH

BLAH GH 0004

;

RUN;

The ("/(\w{2}\d{4})|(\w{2}\s\d{4})/") portion of the code defines the pattern you wish to match.  In this case 2 word elements followed by 4 digits OR  2 word elements followed by a space and 4 digits. CALL PRXSUBSTR finds a string that matches the conditions in PATTERN1 and produces 2 variables which we have named START and LENGTH.  We then use the values in these 2 variables to extract a string from the original COL variable using the SUBSTR function;

Regards,

Scott

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 502 views
  • 8 likes
  • 6 in conversation