BookmarkSubscribeRSS Feed
Buzzy_Bee
Quartz | Level 8

I've code some product codes with a letter in them that needs to be converted to a number and shifted to the front. So for example, 10D should read as 4-10 (to make it consistent with the product code below it). Sometimes the product code will have the letter at the front instead, like A570, but it needs to be converted to 1-570.

I've got a working solution below, but my code is lengthy and ugly. My solution uses scan to return the first segment before the space, then finds A and converts this to its ASCII value, then removes 64 (so for example, the ASCII value of B is 66, so rank('B')-64 returns 2). I then remove the letter off the segment using prxchange, then add the newly converted number to the front using concatenation.

I feel like I could improve this code a lot using PRX functions, but I find them confusing so I wrote this step and didn't know where to go from there:

check = prxparse('/(A|B|C|D)/'); if prxmatch(check,segment1) > 0 then do something...

Thanks for any ideas. It doesn't have to PRX functions, but they seem to be a really concise way of writing code once people get the hang of it, so I'm trying to learn.

 

data have;
	infile datalines dlm=',' dsd truncover;
	input ProductID $ ProductName :$20. ;
	datalines;
2B 7890, TOYS
2-2 7890, TOYS
10D XUM, FABRICS
4-10 XUM, FABRICS
A570 PP, SUGAR
1-570 PP, SUGAR 
;

data WANT;
	SET HAVE;
	segment1=scan(ProductID,1,'');
	if find(segment1,'A') then number=rank('A')-64;
	else if find(segment1,'B') then number=rank('B')-64;
	else if find(segment1,'C') then number=rank('C')-64;
	else if find(segment1,'D') then number=rank('D')-64;
	ProductID_2 = prxchange('s/(A|B|C|D)/$2/', 1, ProductID);	
	if number NE '.' then ProductID_2=cats(number,'-',ProductID_2);
drop number;	
run;
6 REPLIES 6
PGStats
Opal | Level 21

Try this to see if it covers all possible cases:

 

data want;
set have;
length code $12;
code = scan(ProductID, 1);
if prxmatch("/^[A-D]\d+|^\d+[A-D]/o",code) then do;
    code = catx("-", rank(compress(code,"ABCD","K"))-rank("A")+1, compress(code,,"KD"));
    productID = catx(" ", code, scan(ProductID, 2));
    end;
drop code;
run;
PG
Buzzy_Bee
Quartz | Level 8

Thank you, that works perfectly on the test cases, including adding in E and F into the test cases and code.

 

Can I please clarify, this part refers to the specified letters that precede numbers:

/^[A-F]\d+

And this part is for the specified letters when they occur after the numbers:

^\d+[A-F]

But I couldn't find anything that explains what the /o means at the end of the prxmatch?

Thank you.

 

PGStats
Opal | Level 21

The "o" suffix simply tells SAS that the match pattern is a constant that doesn't need to be recompiled.

PG
mkeintz
PROC Star

I've often resisted using prx functions because I have to relearn the special characters every time, so here's another approach.

 

  1. Find the location of an alpha character inside the first blank-separated word in product id.
  2. If one is found look up the position of that character in a string of letters from A to Z - that position provides the 1 for A, ... 26 for Z.  Append a dash to it to build a prefix.
  3. Then drop the original character and concatenate the prefix with the rest of the product id. 
data want (drop=_:) ;
  set have;

  retain _alphabet 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' ;

  _alphaloc=anyalpha(scan(productid,1,' '));
  if _alphaloc then do;
    _prefix=cats(findc(_alphabet,char(productid,_alphaloc)),'-');
	if _alphaloc=1 then productid2= trim(_prefix) || substr(productid,2);
	else productid2=trim(_prefix) || substr(productid,1,_alphaloc-1)||substr(productid,_alphaloc+1);
  end;
  else productid2=productid;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Buzzy_Bee
Quartz | Level 8

Thank you. I've seen ANYDIGIT before and NOTALPHA, but never ANYALPHA. That's a great solution.

RichardDeVen
Barite | Level 11

You can use a Perl regular expression to locate the letter-digits or digits-letter token and replace it to with the equivalent normalized 'letter_as_A_based_index dash digits' token.

 

Example:

data have;
	infile datalines dlm=',' dsd truncover;
	input ProductID $ ProductName :$20. ;
	datalines;
2B 7890, TOYS
2-2 7890, TOYS
10D XUM, FABRICS
4-10 XUM, FABRICS
A570 PP, SUGAR
1-570 PP, SUGAR 
;

data want;
  set have;

  rxid = prxparse ('/\b([a-z])(\d+)\b|\b(\d+)([a-z])\b/i');  * a regex pattern with | alternation;

  if prxmatch(rxid, productid) then do;
    cbnum = prxparen(rxid);                                  * determine which alternation;

    letter = prxposn(rxid,ifn(cbnum=2,1,4),productid);       * extract parts;
    digits = prxposn(rxid,ifn(cbnum=2,2,3),productid);
    letterdig = rank(letter) - rank('A') + 1;                * convert letter part;

    if cbnum = 2 then 
      productid = tranwrd(productid,cats(letter,digits),cats(letterdig,'-',digits));  * replace original token with normalized token;
    else
      productid = tranwrd(productid,cats(digits,letter),cats(letterdig,'-',digits));
  end;

  drop rxid cbnum letter digits letterdig;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1174 views
  • 4 likes
  • 4 in conversation