I've code some product codes with a letter in them that needs to be converted to a number and shifted to the front. So for example, 10D should read as 4-10 (to make it consistent with the product code below it). Sometimes the product code will have the letter at the front instead, like A570, but it needs to be converted to 1-570.
I've got a working solution below, but my code is lengthy and ugly. My solution uses scan to return the first segment before the space, then finds A and converts this to its ASCII value, then removes 64 (so for example, the ASCII value of B is 66, so rank('B')-64 returns 2). I then remove the letter off the segment using prxchange, then add the newly converted number to the front using concatenation.
I feel like I could improve this code a lot using PRX functions, but I find them confusing so I wrote this step and didn't know where to go from there:
check = prxparse('/(A|B|C|D)/'); if prxmatch(check,segment1) > 0 then do something...
Thanks for any ideas. It doesn't have to PRX functions, but they seem to be a really concise way of writing code once people get the hang of it, so I'm trying to learn.
data have; infile datalines dlm=',' dsd truncover; input ProductID $ ProductName :$20. ; datalines; 2B 7890, TOYS 2-2 7890, TOYS 10D XUM, FABRICS 4-10 XUM, FABRICS A570 PP, SUGAR 1-570 PP, SUGAR ; data WANT; SET HAVE; segment1=scan(ProductID,1,''); if find(segment1,'A') then number=rank('A')-64; else if find(segment1,'B') then number=rank('B')-64; else if find(segment1,'C') then number=rank('C')-64; else if find(segment1,'D') then number=rank('D')-64; ProductID_2 = prxchange('s/(A|B|C|D)/$2/', 1, ProductID); if number NE '.' then ProductID_2=cats(number,'-',ProductID_2); drop number; run;
Try this to see if it covers all possible cases:
data want;
set have;
length code $12;
code = scan(ProductID, 1);
if prxmatch("/^[A-D]\d+|^\d+[A-D]/o",code) then do;
code = catx("-", rank(compress(code,"ABCD","K"))-rank("A")+1, compress(code,,"KD"));
productID = catx(" ", code, scan(ProductID, 2));
end;
drop code;
run;
Thank you, that works perfectly on the test cases, including adding in E and F into the test cases and code.
Can I please clarify, this part refers to the specified letters that precede numbers:
/^[A-F]\d+
And this part is for the specified letters when they occur after the numbers:
^\d+[A-F]
But I couldn't find anything that explains what the /o means at the end of the prxmatch?
Thank you.
The "o" suffix simply tells SAS that the match pattern is a constant that doesn't need to be recompiled.
I've often resisted using prx functions because I have to relearn the special characters every time, so here's another approach.
data want (drop=_:) ;
set have;
retain _alphabet 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' ;
_alphaloc=anyalpha(scan(productid,1,' '));
if _alphaloc then do;
_prefix=cats(findc(_alphabet,char(productid,_alphaloc)),'-');
if _alphaloc=1 then productid2= trim(_prefix) || substr(productid,2);
else productid2=trim(_prefix) || substr(productid,1,_alphaloc-1)||substr(productid,_alphaloc+1);
end;
else productid2=productid;
run;
Thank you. I've seen ANYDIGIT before and NOTALPHA, but never ANYALPHA. That's a great solution.
You can use a Perl regular expression to locate the letter-digits or digits-letter token and replace it to with the equivalent normalized 'letter_as_A_based_index dash digits' token.
Example:
data have; infile datalines dlm=',' dsd truncover; input ProductID $ ProductName :$20. ; datalines; 2B 7890, TOYS 2-2 7890, TOYS 10D XUM, FABRICS 4-10 XUM, FABRICS A570 PP, SUGAR 1-570 PP, SUGAR ; data want; set have; rxid = prxparse ('/\b([a-z])(\d+)\b|\b(\d+)([a-z])\b/i'); * a regex pattern with | alternation; if prxmatch(rxid, productid) then do; cbnum = prxparen(rxid); * determine which alternation; letter = prxposn(rxid,ifn(cbnum=2,1,4),productid); * extract parts; digits = prxposn(rxid,ifn(cbnum=2,2,3),productid); letterdig = rank(letter) - rank('A') + 1; * convert letter part; if cbnum = 2 then productid = tranwrd(productid,cats(letter,digits),cats(letterdig,'-',digits)); * replace original token with normalized token; else productid = tranwrd(productid,cats(digits,letter),cats(letterdig,'-',digits)); end; drop rxid cbnum letter digits letterdig; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.