BookmarkSubscribeRSS Feed
Buzzy_Bee
Quartz | Level 8

I've code some product codes with a letter in them that needs to be converted to a number and shifted to the front. So for example, 10D should read as 4-10 (to make it consistent with the product code below it). Sometimes the product code will have the letter at the front instead, like A570, but it needs to be converted to 1-570.

I've got a working solution below, but my code is lengthy and ugly. My solution uses scan to return the first segment before the space, then finds A and converts this to its ASCII value, then removes 64 (so for example, the ASCII value of B is 66, so rank('B')-64 returns 2). I then remove the letter off the segment using prxchange, then add the newly converted number to the front using concatenation.

I feel like I could improve this code a lot using PRX functions, but I find them confusing so I wrote this step and didn't know where to go from there:

check = prxparse('/(A|B|C|D)/'); if prxmatch(check,segment1) > 0 then do something...

Thanks for any ideas. It doesn't have to PRX functions, but they seem to be a really concise way of writing code once people get the hang of it, so I'm trying to learn.

 

data have;
	infile datalines dlm=',' dsd truncover;
	input ProductID $ ProductName :$20. ;
	datalines;
2B 7890, TOYS
2-2 7890, TOYS
10D XUM, FABRICS
4-10 XUM, FABRICS
A570 PP, SUGAR
1-570 PP, SUGAR 
;

data WANT;
	SET HAVE;
	segment1=scan(ProductID,1,'');
	if find(segment1,'A') then number=rank('A')-64;
	else if find(segment1,'B') then number=rank('B')-64;
	else if find(segment1,'C') then number=rank('C')-64;
	else if find(segment1,'D') then number=rank('D')-64;
	ProductID_2 = prxchange('s/(A|B|C|D)/$2/', 1, ProductID);	
	if number NE '.' then ProductID_2=cats(number,'-',ProductID_2);
drop number;	
run;
6 REPLIES 6
PGStats
Opal | Level 21

Try this to see if it covers all possible cases:

 

data want;
set have;
length code $12;
code = scan(ProductID, 1);
if prxmatch("/^[A-D]\d+|^\d+[A-D]/o",code) then do;
    code = catx("-", rank(compress(code,"ABCD","K"))-rank("A")+1, compress(code,,"KD"));
    productID = catx(" ", code, scan(ProductID, 2));
    end;
drop code;
run;
PG
Buzzy_Bee
Quartz | Level 8

Thank you, that works perfectly on the test cases, including adding in E and F into the test cases and code.

 

Can I please clarify, this part refers to the specified letters that precede numbers:

/^[A-F]\d+

And this part is for the specified letters when they occur after the numbers:

^\d+[A-F]

But I couldn't find anything that explains what the /o means at the end of the prxmatch?

Thank you.

 

PGStats
Opal | Level 21

The "o" suffix simply tells SAS that the match pattern is a constant that doesn't need to be recompiled.

PG
mkeintz
PROC Star

I've often resisted using prx functions because I have to relearn the special characters every time, so here's another approach.

 

  1. Find the location of an alpha character inside the first blank-separated word in product id.
  2. If one is found look up the position of that character in a string of letters from A to Z - that position provides the 1 for A, ... 26 for Z.  Append a dash to it to build a prefix.
  3. Then drop the original character and concatenate the prefix with the rest of the product id. 
data want (drop=_:) ;
  set have;

  retain _alphabet 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' ;

  _alphaloc=anyalpha(scan(productid,1,' '));
  if _alphaloc then do;
    _prefix=cats(findc(_alphabet,char(productid,_alphaloc)),'-');
	if _alphaloc=1 then productid2= trim(_prefix) || substr(productid,2);
	else productid2=trim(_prefix) || substr(productid,1,_alphaloc-1)||substr(productid,_alphaloc+1);
  end;
  else productid2=productid;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Buzzy_Bee
Quartz | Level 8

Thank you. I've seen ANYDIGIT before and NOTALPHA, but never ANYALPHA. That's a great solution.

RichardDeVen
Barite | Level 11

You can use a Perl regular expression to locate the letter-digits or digits-letter token and replace it to with the equivalent normalized 'letter_as_A_based_index dash digits' token.

 

Example:

data have;
	infile datalines dlm=',' dsd truncover;
	input ProductID $ ProductName :$20. ;
	datalines;
2B 7890, TOYS
2-2 7890, TOYS
10D XUM, FABRICS
4-10 XUM, FABRICS
A570 PP, SUGAR
1-570 PP, SUGAR 
;

data want;
  set have;

  rxid = prxparse ('/\b([a-z])(\d+)\b|\b(\d+)([a-z])\b/i');  * a regex pattern with | alternation;

  if prxmatch(rxid, productid) then do;
    cbnum = prxparen(rxid);                                  * determine which alternation;

    letter = prxposn(rxid,ifn(cbnum=2,1,4),productid);       * extract parts;
    digits = prxposn(rxid,ifn(cbnum=2,2,3),productid);
    letterdig = rank(letter) - rank('A') + 1;                * convert letter part;

    if cbnum = 2 then 
      productid = tranwrd(productid,cats(letter,digits),cats(letterdig,'-',digits));  * replace original token with normalized token;
    else
      productid = tranwrd(productid,cats(digits,letter),cats(letterdig,'-',digits));
  end;

  drop rxid cbnum letter digits letterdig;
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1379 views
  • 4 likes
  • 4 in conversation