BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BETO
Fluorite | Level 6

HI,

i need to extract data that will be 5 characters example TG0001 which is machine #  . I'm currently dealing with over 5k machines which have a unique number . I tried compress an it just provided me the # not the letters.

the column something like

column

blah blah TG0001

blah DF0001

blah blah FG0001 blah blah

blah GH 0001

In some cases there might be a space between number an alpha . Not pretty the data is manually input

thank you for your

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Sorry, my bad.

NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,CUST_NAME);

double check the variable name (not the label). If it does have blanks embedded like you said, then you will need literal reference:

  NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,'CUST NAME'n); 

View solution in original post

15 REPLIES 15
Fugue
Quartz | Level 8

Hard to envision how to possibly tackle this without more concrete examples of text lines you are dealing with. Any chance you can provide a sample, with actual text values (of course, changing anything that might be security sensitive)?

Haikuo
Onyx | Level 15

If you don't want keep any non-digits at all, and there is no other digits except this one location, then COMPRESS() will be my first option:

DATA HAVE;

     INPUT VAR $ 40.;

     NUM=COMPRESS(VAR,,'KD');

     CARDS;

blah blah TG0001

blah DF0001

blah blah FG0001 blah blah

blah GH 0001

;


Regards,

Haikuo

stat_sas
Ammonite | Level 13

Based on the specified pattern this may produce the desired output.

data have;

input col $ 30.;

datalines;

blah blah TG2001

blah DF0001

blah blah FG0001 blah blah

blah GH 0004

;

data want;

length machine_number $8.;

set have;

machine_number=catt(substr(col,find(col,compress(col,,'kd'))-3,3),compress(col,,'kd'));

run;

BETO
Fluorite | Level 6

HI stat@sas  your snippet of code work pretty good gave me about 80 % which is acceptable ..the issue that I have run across is that the output is coming in with spaces . I did a compress but it removed the 2 character Id and just gave me the numical. Any thoughts how to remove blanks an keep the output intact? Thanks again

Haikuo
Onyx | Level 15

Still the PRX functions will offer a succinct solution, if you want to get rid of the blanks:

DATA HAVE;

INPUT VAR $ 40.;

/*     NUM=COMPRESS(VAR,,'KD');*/

      NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,VAR);

CARDS;

blah blah TG0001

blah DF0001

blah blah FG0001 blah blah

blah GH 0001

;

You will have to show some sample data that is not working. People here can only provide solutions based on your sample data, we don't know your data, you do.

Good luck,

Haikuo

BETO
Fluorite | Level 6

Hai Kuo

i will be refer a table name info where do I make refer to it in that syntax ? The column name is cust name where do I make that refer ..thanks

Haikuo
Onyx | Level 15

DATA want;

set YOURDATA;

  NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,VAR);

RUN;

BETO
Fluorite | Level 6

HI,

your data is name of table ...where do I make refer to column the name of column is "CUST NAME"

thanks

Haikuo
Onyx | Level 15

Sorry, my bad.

NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,CUST_NAME);

double check the variable name (not the label). If it does have blanks embedded like you said, then you will need literal reference:

  NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,'CUST NAME'n); 

BETO
Fluorite | Level 6

HEre is the error I'm getting

DATA Info;

822  set info

823  /*     NUM=COMPRESS(VAR,,'KD');*/

824     NEWVAR=PRXCHANGE('s/.+([A-Z]{2}) *(\d+).*/$1$2/o',-1,'CUST_NAME'n);

              -

              22

              76

ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, -, :, ;,

              END, INDSNAME, KEY, KEYS, NOBS, OPEN, POINT, _DATA_, _LAST_, _NULL_.

ERROR 76-322: Syntax error, statement will be ignored.

825

826  ;

827  run;

Haikuo
Onyx | Level 15

you are missing a ";' after set info.

BETO
Fluorite | Level 6

Never mind I fig I was missing a ; it works thanks

stat_sas
Ammonite | Level 13

Hi,

This is based on what you provided. If you can provide some extract from 20% where this is not working will help to understand your problem in a better way.

Thanks,

Scott_Mitchell
Quartz | Level 8

Hi BETO,

It is difficult to produce a solution that works without knowing the exact structure of the data in which it is contained.

Here is an approach using PERL regular expressions.

DATA HAVE;

INPUT COL $ 30.;

  IF _N_ = 1 THEN

  DO;

       RETAIN PATTERN1;

       PATTERN1 = PRXPARSE("/(\w{2}\d{4})|(\w{2}\s\d{4})/");

  END;

  CALL PRXSUBSTR(PATTERN1,COL,START,LENGTH);

  COL1 = COMPRESS(SUBSTR(COL,START,LENGTH));

DATALINES;

BLAH BLAH TG2001

BLAH DF0001

BLAH BLAH FG0001 BLAH BLAH

BLAH GH 0004

;

RUN;

The ("/(\w{2}\d{4})|(\w{2}\s\d{4})/") portion of the code defines the pattern you wish to match.  In this case 2 word elements followed by 4 digits OR  2 word elements followed by a space and 4 digits. CALL PRXSUBSTR finds a string that matches the conditions in PATTERN1 and produces 2 variables which we have named START and LENGTH.  We then use the values in these 2 variables to extract a string from the original COL variable using the SUBSTR function;

Regards,

Scott

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1423 views
  • 8 likes
  • 6 in conversation