I've code some product codes with a letter in them that needs to be converted to a number and shifted to the front. So for example, 10D should read as 4-10 (to make it consistent with the product code below it). Sometimes the product code will have the letter at the front instead, like A570, but it needs to be converted to 1-570.
I've got a working solution below, but my code is lengthy and ugly. My solution uses scan to return the first segment before the space, then finds A and converts this to its ASCII value, then removes 64 (so for example, the ASCII value of B is 66, so rank('B')-64 returns 2). I then remove the letter off the segment using prxchange, then add the newly converted number to the front using concatenation.
I feel like I could improve this code a lot using PRX functions, but I find them confusing so I wrote this step and didn't know where to go from there:
check = prxparse('/(A|B|C|D)/'); if prxmatch(check,segment1) > 0 then do something...
Thanks for any ideas. It doesn't have to PRX functions, but they seem to be a really concise way of writing code once people get the hang of it, so I'm trying to learn.
data have; infile datalines dlm=',' dsd truncover; input ProductID $ ProductName :$20. ; datalines; 2B 7890, TOYS 2-2 7890, TOYS 10D XUM, FABRICS 4-10 XUM, FABRICS A570 PP, SUGAR 1-570 PP, SUGAR ; data WANT; SET HAVE; segment1=scan(ProductID,1,''); if find(segment1,'A') then number=rank('A')-64; else if find(segment1,'B') then number=rank('B')-64; else if find(segment1,'C') then number=rank('C')-64; else if find(segment1,'D') then number=rank('D')-64; ProductID_2 = prxchange('s/(A|B|C|D)/$2/', 1, ProductID); if number NE '.' then ProductID_2=cats(number,'-',ProductID_2); drop number; run;
Try this to see if it covers all possible cases:
data want;
set have;
length code $12;
code = scan(ProductID, 1);
if prxmatch("/^[A-D]\d+|^\d+[A-D]/o",code) then do;
code = catx("-", rank(compress(code,"ABCD","K"))-rank("A")+1, compress(code,,"KD"));
productID = catx(" ", code, scan(ProductID, 2));
end;
drop code;
run;
Thank you, that works perfectly on the test cases, including adding in E and F into the test cases and code.
Can I please clarify, this part refers to the specified letters that precede numbers:
/^[A-F]\d+
And this part is for the specified letters when they occur after the numbers:
^\d+[A-F]
But I couldn't find anything that explains what the /o means at the end of the prxmatch?
Thank you.
The "o" suffix simply tells SAS that the match pattern is a constant that doesn't need to be recompiled.
I've often resisted using prx functions because I have to relearn the special characters every time, so here's another approach.
data want (drop=_:) ;
set have;
retain _alphabet 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' ;
_alphaloc=anyalpha(scan(productid,1,' '));
if _alphaloc then do;
_prefix=cats(findc(_alphabet,char(productid,_alphaloc)),'-');
if _alphaloc=1 then productid2= trim(_prefix) || substr(productid,2);
else productid2=trim(_prefix) || substr(productid,1,_alphaloc-1)||substr(productid,_alphaloc+1);
end;
else productid2=productid;
run;
Thank you. I've seen ANYDIGIT before and NOTALPHA, but never ANYALPHA. That's a great solution.
You can use a Perl regular expression to locate the letter-digits or digits-letter token and replace it to with the equivalent normalized 'letter_as_A_based_index dash digits' token.
Example:
data have; infile datalines dlm=',' dsd truncover; input ProductID $ ProductName :$20. ; datalines; 2B 7890, TOYS 2-2 7890, TOYS 10D XUM, FABRICS 4-10 XUM, FABRICS A570 PP, SUGAR 1-570 PP, SUGAR ; data want; set have; rxid = prxparse ('/\b([a-z])(\d+)\b|\b(\d+)([a-z])\b/i'); * a regex pattern with | alternation; if prxmatch(rxid, productid) then do; cbnum = prxparen(rxid); * determine which alternation; letter = prxposn(rxid,ifn(cbnum=2,1,4),productid); * extract parts; digits = prxposn(rxid,ifn(cbnum=2,2,3),productid); letterdig = rank(letter) - rank('A') + 1; * convert letter part; if cbnum = 2 then productid = tranwrd(productid,cats(letter,digits),cats(letterdig,'-',digits)); * replace original token with normalized token; else productid = tranwrd(productid,cats(digits,letter),cats(letterdig,'-',digits)); end; drop rxid cbnum letter digits letterdig; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.