Take a letter from a string of numbers, convert it to a number and rep...

Buzzy_Bee · Posted 09-04-2020 10:37 PM

I've code some product codes with a letter in them that needs to be converted to a number and shifted to the front. So for example, 10D should read as 4-10 (to make it consistent with the product code below it). Sometimes the product code will have the letter at the front instead, like A570, but it needs to be converted to 1-570.

I've got a working solution below, but my code is lengthy and ugly. My solution uses scan to return the first segment before the space, then finds A and converts this to its ASCII value, then removes 64 (so for example, the ASCII value of B is 66, so rank('B')-64 returns 2). I then remove the letter off the segment using prxchange, then add the newly converted number to the front using concatenation.

I feel like I could improve this code a lot using PRX functions, but I find them confusing so I wrote this step and didn't know where to go from there:

check = prxparse('/(A|B|C|D)/'); if prxmatch(check,segment1) > 0 then do something...

Thanks for any ideas. It doesn't have to PRX functions, but they seem to be a really concise way of writing code once people get the hang of it, so I'm trying to learn.

data have;
	infile datalines dlm=',' dsd truncover;
	input ProductID $ ProductName :$20. ;
	datalines;
2B 7890, TOYS
2-2 7890, TOYS
10D XUM, FABRICS
4-10 XUM, FABRICS
A570 PP, SUGAR
1-570 PP, SUGAR 
;

data WANT;
	SET HAVE;
	segment1=scan(ProductID,1,'');
	if find(segment1,'A') then number=rank('A')-64;
	else if find(segment1,'B') then number=rank('B')-64;
	else if find(segment1,'C') then number=rank('C')-64;
	else if find(segment1,'D') then number=rank('D')-64;
	ProductID_2 = prxchange('s/(A|B|C|D)/$2/', 1, ProductID);	
	if number NE '.' then ProductID_2=cats(number,'-',ProductID_2);
drop number;	
run;

PGStats · Posted 09-04-2020 11:00 PM

Try this to see if it covers all possible cases:

data want;
set have;
length code $12;
code = scan(ProductID, 1);
if prxmatch("/^[A-D]\d+|^\d+[A-D]/o",code) then do;
    code = catx("-", rank(compress(code,"ABCD","K"))-rank("A")+1, compress(code,,"KD"));
    productID = catx(" ", code, scan(ProductID, 2));
    end;
drop code;
run;

PG

Buzzy_Bee · Posted 09-04-2020 11:57 PM

Thank you, that works perfectly on the test cases, including adding in E and F into the test cases and code.

Can I please clarify, this part refers to the specified letters that precede numbers:

/^[A-F]\d+

And this part is for the specified letters when they occur after the numbers:

^\d+[A-F]

But I couldn't find anything that explains what the /o means at the end of the prxmatch?

Thank you.

PGStats · Posted 09-05-2020 02:19 PM

The "o" suffix simply tells SAS that the match pattern is a constant that doesn't need to be recompiled.

PG

mkeintz · Posted 09-04-2020 11:53 PM

I've often resisted using prx functions because I have to relearn the special characters every time, so here's another approach.

Find the location of an alpha character inside the first blank-separated word in product id.
If one is found look up the position of that character in a string of letters from A to Z - that position provides the 1 for A, ... 26 for Z. Append a dash to it to build a prefix.
Then drop the original character and concatenate the prefix with the rest of the product id.

data want (drop=_:) ;
  set have;

  retain _alphabet 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' ;

  _alphaloc=anyalpha(scan(productid,1,' '));
  if _alphaloc then do;
    _prefix=cats(findc(_alphabet,char(productid,_alphaloc)),'-');
	if _alphaloc=1 then productid2= trim(_prefix) || substr(productid,2);
	else productid2=trim(_prefix) || substr(productid,1,_alphaloc-1)||substr(productid,_alphaloc+1);
  end;
  else productid2=productid;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Buzzy_Bee · Posted 09-05-2020 12:08 AM

Thank you. I've seen ANYDIGIT before and NOTALPHA, but never ANYALPHA. That's a great solution.

RichardDeVen · Posted 09-05-2020 05:41 AM

You can use a Perl regular expression to locate the letter-digits or digits-letter token and replace it to with the equivalent normalized 'letter_as_A_based_index dash digits' token.

Example:

data have;
	infile datalines dlm=',' dsd truncover;
	input ProductID $ ProductName :$20. ;
	datalines;
2B 7890, TOYS
2-2 7890, TOYS
10D XUM, FABRICS
4-10 XUM, FABRICS
A570 PP, SUGAR
1-570 PP, SUGAR 
;

data want;
  set have;

  rxid = prxparse ('/\b([a-z])(\d+)\b|\b(\d+)([a-z])\b/i');  * a regex pattern with | alternation;

  if prxmatch(rxid, productid) then do;
    cbnum = prxparen(rxid);                                  * determine which alternation;

    letter = prxposn(rxid,ifn(cbnum=2,1,4),productid);       * extract parts;
    digits = prxposn(rxid,ifn(cbnum=2,2,3),productid);
    letterdig = rank(letter) - rank('A') + 1;                * convert letter part;

    if cbnum = 2 then 
      productid = tranwrd(productid,cats(letter,digits),cats(letterdig,'-',digits));  * replace original token with normalized token;
    else
      productid = tranwrd(productid,cats(digits,letter),cats(letterdig,'-',digits));
  end;

  drop rxid cbnum letter digits letterdig;
run;

Take a letter from a string of numbers, convert it to a number and reposition it

Re: Take a letter from a string of numbers, convert it to a number and reposition it

Re: Take a letter from a string of numbers, convert it to a number and reposition it

Re: Take a letter from a string of numbers, convert it to a number and reposition it

Re: Take a letter from a string of numbers, convert it to a number and reposition it

Re: Take a letter from a string of numbers, convert it to a number and reposition it

Re: Take a letter from a string of numbers, convert it to a number and reposition it

SAS Innovate 2025: Call for Content

Classroom Training Available!