I've code some product codes with a letter in them that needs to be converted to a number and shifted to the front. So for example, 10D should read as 4-10 (to make it consistent with the product code below it). Sometimes the product code will have the letter at the front instead, like A570, but it needs to be converted to 1-570.
I've got a working solution below, but my code is lengthy and ugly. My solution uses scan to return the first segment before the space, then finds A and converts this to its ASCII value, then removes 64 (so for example, the ASCII value of B is 66, so rank('B')-64 returns 2). I then remove the letter off the segment using prxchange, then add the newly converted number to the front using concatenation.
I feel like I could improve this code a lot using PRX functions, but I find them confusing so I wrote this step and didn't know where to go from there:
check = prxparse('/(A|B|C|D)/'); if prxmatch(check,segment1) > 0 then do something...
Thanks for any ideas. It doesn't have to PRX functions, but they seem to be a really concise way of writing code once people get the hang of it, so I'm trying to learn.
data have;
	infile datalines dlm=',' dsd truncover;
	input ProductID $ ProductName :$20. ;
	datalines;
2B 7890, TOYS
2-2 7890, TOYS
10D XUM, FABRICS
4-10 XUM, FABRICS
A570 PP, SUGAR
1-570 PP, SUGAR 
;
data WANT;
	SET HAVE;
	segment1=scan(ProductID,1,'');
	if find(segment1,'A') then number=rank('A')-64;
	else if find(segment1,'B') then number=rank('B')-64;
	else if find(segment1,'C') then number=rank('C')-64;
	else if find(segment1,'D') then number=rank('D')-64;
	ProductID_2 = prxchange('s/(A|B|C|D)/$2/', 1, ProductID);	
	if number NE '.' then ProductID_2=cats(number,'-',ProductID_2);
drop number;	
run;
					
				
			
			
				
			
			
			
			
			
			
			
		Try this to see if it covers all possible cases:
data want;
set have;
length code $12;
code = scan(ProductID, 1);
if prxmatch("/^[A-D]\d+|^\d+[A-D]/o",code) then do;
    code = catx("-", rank(compress(code,"ABCD","K"))-rank("A")+1, compress(code,,"KD"));
    productID = catx(" ", code, scan(ProductID, 2));
    end;
drop code;
run;Thank you, that works perfectly on the test cases, including adding in E and F into the test cases and code.
Can I please clarify, this part refers to the specified letters that precede numbers:
/^[A-F]\d+
And this part is for the specified letters when they occur after the numbers:
^\d+[A-F]
But I couldn't find anything that explains what the /o means at the end of the prxmatch?
Thank you.
The "o" suffix simply tells SAS that the match pattern is a constant that doesn't need to be recompiled.
I've often resisted using prx functions because I have to relearn the special characters every time, so here's another approach.
data want (drop=_:) ;
  set have;
  retain _alphabet 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' ;
  _alphaloc=anyalpha(scan(productid,1,' '));
  if _alphaloc then do;
    _prefix=cats(findc(_alphabet,char(productid,_alphaloc)),'-');
	if _alphaloc=1 then productid2= trim(_prefix) || substr(productid,2);
	else productid2=trim(_prefix) || substr(productid,1,_alphaloc-1)||substr(productid,_alphaloc+1);
  end;
  else productid2=productid;
run;
Thank you. I've seen ANYDIGIT before and NOTALPHA, but never ANYALPHA. That's a great solution.
You can use a Perl regular expression to locate the letter-digits or digits-letter token and replace it to with the equivalent normalized 'letter_as_A_based_index dash digits' token.
Example:
data have;
	infile datalines dlm=',' dsd truncover;
	input ProductID $ ProductName :$20. ;
	datalines;
2B 7890, TOYS
2-2 7890, TOYS
10D XUM, FABRICS
4-10 XUM, FABRICS
A570 PP, SUGAR
1-570 PP, SUGAR 
;
data want;
  set have;
  rxid = prxparse ('/\b([a-z])(\d+)\b|\b(\d+)([a-z])\b/i');  * a regex pattern with | alternation;
  if prxmatch(rxid, productid) then do;
    cbnum = prxparen(rxid);                                  * determine which alternation;
    letter = prxposn(rxid,ifn(cbnum=2,1,4),productid);       * extract parts;
    digits = prxposn(rxid,ifn(cbnum=2,2,3),productid);
    letterdig = rank(letter) - rank('A') + 1;                * convert letter part;
    if cbnum = 2 then 
      productid = tranwrd(productid,cats(letter,digits),cats(letterdig,'-',digits));  * replace original token with normalized token;
    else
      productid = tranwrd(productid,cats(digits,letter),cats(letterdig,'-',digits));
  end;
  drop rxid cbnum letter digits letterdig;
run;
					
				
			
			
				
			
			
			
			
			
			
			
		It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
