Hi all,
I am quite confused about how to use the lengthn, cat, catx and substr functions all together for string manipulation. I am trying to add a variable that inserts a dash between the 2nd to last character and the last character of Item and wraps the value in parentheses. For example, Item 144L becomes (144-L). Use the LENGTHN, SUBSTR, CAT and CATX functions. The solution should be generalizable to any value of Item containing 2 or more characters. So far, I have been building the program using examples I have seen but I keep getting confused. Please can you help me fix my program and include explanations if u can. I just need to understand how all the functions work together to solve this problem. Thanks.
data CustomerDB ;
input Customer & $18.
Item : $11. ;
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
data newTemp;
set CustomerDB;
if lengthn(Item) ge 3 then
secondtoLastRemoved = catx("-",(substr(Item,1,(lengthn(Item)-2))),(substr(Item,lengthn(Item), 1))
); /*Dropping Second to last characters from string?*/
else
secondtoLastRemoved = Item;
if findc(Item,'mart','i') gt 0 then
bool = 1; /*setting boolean for words that have the string "mart" in them */
else
bool = 0;
lowcaseCustomer = lowcase(compress(Customer,,'p')); /* Remove Punctuation and make letters lowercase */
run;
proc print data = newTemp;
title "CustomerDB string Manipulations";
run;
Before any explanation. Does this give you the desired result? 🙂
data CustomerDB ;
input Customer & $18.
Item : $11. ;
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
data want;
set CustomerDB;
length NewItem $50;
NewItem=cats("(", substr(Item, 1, length(Item)-1), "-", char(Item, length(Item)), ")");
run;
proc print data=want;
run;
Before any explanation. Does this give you the desired result? 🙂
data CustomerDB ;
input Customer & $18.
Item : $11. ;
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
data want;
set CustomerDB;
length NewItem $50;
NewItem=cats("(", substr(Item, 1, length(Item)-1), "-", char(Item, length(Item)), ")");
run;
proc print data=want;
run;
Yes it does but please could you explain the program and the functions. That is what I was confused about.
When working with char-variables it is important to use the length-statement to define the variable with suitable length.
I think that
if findc(Item,'mart','i') gt 0 then
has at least to issues: The variable "Item" seems to be wrongly used, it should be "Customer", right? The function findc is used to verify the presence of letters in a string, not to find words. Maybe findw is what you need here.
To debug such a step use additional variables for lengthn(Item) and the substrings you take to see what sas does.
Please post the expected result, i am not sure that i fully understand the logic you want applied:
data newTemp;
set CustomerDB;
length secondtoLastRemoved $ 14;
if lengthn(Item) ge 3 then do;
secondtoLastRemoved = cats('(', substr(Item, 1, lengthn(Item)-1), '-', substr(Item, lengthn(Item)-1, 1), ')');
end;
else do;
secondtoLastRemoved=Item;
end;
/*setting boolean for words that have the string "mart" in them */
bool = index(lowcase(Customer), 'mart') > 0;
*bool = findw(Customer, 'mart', , 'spit') > 0;
/* Remove Punctuation and make letters lowercase */
lowcaseCustomer = lowcase(compress(Customer, , 'p'));
run;
And again @PeterClemmensen was faster 😉
However, @andreas_lds was way more thorough in his explanation 🙂
Hi Andreas,
Thanks for the explanation and also pointing out the mistake I made with the findc function. Yes, I was trying to find the name of the customer that had the string 'mart'. Here is a snippet of the code and output.
data test ;
length Customer $18 Item $11 Want $14 ;
input Customer & Item;
want = cats('(',substr(item,1,length(item)-1),'-',char(item,length(item)),')');
bool = index(lowcase(Customer), 'mart') > 0;
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
proc print;
run;
You did not post the output yet .
data CustomerDB ;
input Customer & $18.
Item : $11. ;
want=prxchange('s/^(\d+)([a-z]+)$/($1-$2)/i',1,strip(item));
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
You didn't show what you want the result to look like. Here is what your words said to do.
data test ;
length Customer $18 Item $11 Want $14 ;
input Customer & Item;
want = cats('(',substr(item,1,length(item)-1),'-',char(item,length(item)),')');
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
proc print;
run;
Obs Customer Item Want 1 Bartco Corporation 143L (143-L) 2 Cost Cutter's 142 (14-2) 3 Minimart Inc. 188SW (188S-W) 4 Bosco, LLC 908X321 (908X32-1) 5 K-MART. A122L (A122-L) 6 Food Unlimited 1898XVWX (1898XVW-X) 7 Shop and Drop 24L (24-L) 8 Shop Marty's 222X (222-X) 9 Pet's are Us W100 (W10-0) 10 Roger's Spirits 1522L (1522-L) 11 Artful Spirits 407XxxXxx (407XxxXx-x) 12 Up/Down mart Inc. ABCdefghxxx (ABCdefghxx-x)
If you want to avoid problems with short Item strings, tweak @Tom 's idea a bit:
data test ;
length Customer $18 Item $11 Want $14 ;
input Customer & Item;
want = cats('(',catx("-", substrn(item,1,length(item)-1),char(item,length(item))),')');
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
Tiny test a
Nasty test .
;
proc print;
run;
Obs Customer Item Want 1 Bartco Corporation 143L (143-L) 2 Cost Cutter's 142 (14-2) 3 Minimart Inc. 188SW (188S-W) 4 Bosco, LLC 908X321 (908X32-1) 5 K-MART. A122L (A122-L) 6 Food Unlimited 1898XVWX (1898XVW-X) 7 Shop and Drop 24L (24-L) 8 Shop Marty's 222X (222-X) 9 Pet's are Us W100 (W10-0) 10 Roger's Spirits 1522L (1522-L) 11 Artful Spirits 407XxxXxx (407XxxXx-x) 12 Up/Down mart Inc. ABCdefghxxx (ABCdefghxx-x) 13 Tiny test a (a) 14 Nasty test ()
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.