Hi all,
I am quite confused about how to use the lengthn, cat, catx and substr functions all together for string manipulation. I am trying to add a variable that inserts a dash between the 2nd to last character and the last character of Item and wraps the value in parentheses. For example, Item 144L becomes (144-L). Use the LENGTHN, SUBSTR, CAT and CATX functions. The solution should be generalizable to any value of Item containing 2 or more characters. So far, I have been building the program using examples I have seen but I keep getting confused. Please can you help me fix my program and include explanations if u can. I just need to understand how all the functions work together to solve this problem. Thanks.
data CustomerDB ;
input Customer & $18.
Item : $11. ;
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
data newTemp;
set CustomerDB;
if lengthn(Item) ge 3 then
secondtoLastRemoved = catx("-",(substr(Item,1,(lengthn(Item)-2))),(substr(Item,lengthn(Item), 1))
); /*Dropping Second to last characters from string?*/
else
secondtoLastRemoved = Item;
if findc(Item,'mart','i') gt 0 then
bool = 1; /*setting boolean for words that have the string "mart" in them */
else
bool = 0;
lowcaseCustomer = lowcase(compress(Customer,,'p')); /* Remove Punctuation and make letters lowercase */
run;
proc print data = newTemp;
title "CustomerDB string Manipulations";
run;
Before any explanation. Does this give you the desired result? 🙂
data CustomerDB ;
input Customer & $18.
Item : $11. ;
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
data want;
set CustomerDB;
length NewItem $50;
NewItem=cats("(", substr(Item, 1, length(Item)-1), "-", char(Item, length(Item)), ")");
run;
proc print data=want;
run;
Before any explanation. Does this give you the desired result? 🙂
data CustomerDB ;
input Customer & $18.
Item : $11. ;
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
data want;
set CustomerDB;
length NewItem $50;
NewItem=cats("(", substr(Item, 1, length(Item)-1), "-", char(Item, length(Item)), ")");
run;
proc print data=want;
run;
Yes it does but please could you explain the program and the functions. That is what I was confused about.
When working with char-variables it is important to use the length-statement to define the variable with suitable length.
I think that
if findc(Item,'mart','i') gt 0 then
has at least to issues: The variable "Item" seems to be wrongly used, it should be "Customer", right? The function findc is used to verify the presence of letters in a string, not to find words. Maybe findw is what you need here.
To debug such a step use additional variables for lengthn(Item) and the substrings you take to see what sas does.
Please post the expected result, i am not sure that i fully understand the logic you want applied:
data newTemp;
set CustomerDB;
length secondtoLastRemoved $ 14;
if lengthn(Item) ge 3 then do;
secondtoLastRemoved = cats('(', substr(Item, 1, lengthn(Item)-1), '-', substr(Item, lengthn(Item)-1, 1), ')');
end;
else do;
secondtoLastRemoved=Item;
end;
/*setting boolean for words that have the string "mart" in them */
bool = index(lowcase(Customer), 'mart') > 0;
*bool = findw(Customer, 'mart', , 'spit') > 0;
/* Remove Punctuation and make letters lowercase */
lowcaseCustomer = lowcase(compress(Customer, , 'p'));
run;
And again @PeterClemmensen was faster 😉
However, @andreas_lds was way more thorough in his explanation 🙂
Hi Andreas,
Thanks for the explanation and also pointing out the mistake I made with the findc function. Yes, I was trying to find the name of the customer that had the string 'mart'. Here is a snippet of the code and output.
data test ;
length Customer $18 Item $11 Want $14 ;
input Customer & Item;
want = cats('(',substr(item,1,length(item)-1),'-',char(item,length(item)),')');
bool = index(lowcase(Customer), 'mart') > 0;
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
proc print;
run;
You did not post the output yet .
data CustomerDB ;
input Customer & $18.
Item : $11. ;
want=prxchange('s/^(\d+)([a-z]+)$/($1-$2)/i',1,strip(item));
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
You didn't show what you want the result to look like. Here is what your words said to do.
data test ;
length Customer $18 Item $11 Want $14 ;
input Customer & Item;
want = cats('(',substr(item,1,length(item)-1),'-',char(item,length(item)),')');
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
;
proc print;
run;
Obs Customer Item Want 1 Bartco Corporation 143L (143-L) 2 Cost Cutter's 142 (14-2) 3 Minimart Inc. 188SW (188S-W) 4 Bosco, LLC 908X321 (908X32-1) 5 K-MART. A122L (A122-L) 6 Food Unlimited 1898XVWX (1898XVW-X) 7 Shop and Drop 24L (24-L) 8 Shop Marty's 222X (222-X) 9 Pet's are Us W100 (W10-0) 10 Roger's Spirits 1522L (1522-L) 11 Artful Spirits 407XxxXxx (407XxxXx-x) 12 Up/Down mart Inc. ABCdefghxxx (ABCdefghxx-x)
If you want to avoid problems with short Item strings, tweak @Tom 's idea a bit:
data test ;
length Customer $18 Item $11 Want $14 ;
input Customer & Item;
want = cats('(',catx("-", substrn(item,1,length(item)-1),char(item,length(item))),')');
datalines;
Bartco Corporation 143L
Cost Cutter's 142
Minimart Inc. 188SW
Bosco, LLC 908X321
K-MART. A122L
Food Unlimited 1898XVWX
Shop and Drop 24L
Shop Marty's 222X
Pet's are Us W100
Roger's Spirits 1522L
Artful Spirits 407XxxXxx
Up/Down mart Inc. ABCdefghxxx
Tiny test a
Nasty test .
;
proc print;
run;
Obs Customer Item Want
1 Bartco Corporation 143L (143-L)
2 Cost Cutter's 142 (14-2)
3 Minimart Inc. 188SW (188S-W)
4 Bosco, LLC 908X321 (908X32-1)
5 K-MART. A122L (A122-L)
6 Food Unlimited 1898XVWX (1898XVW-X)
7 Shop and Drop 24L (24-L)
8 Shop Marty's 222X (222-X)
9 Pet's are Us W100 (W10-0)
10 Roger's Spirits 1522L (1522-L)
11 Artful Spirits 407XxxXxx (407XxxXx-x)
12 Up/Down mart Inc. ABCdefghxxx (ABCdefghxx-x)
13 Tiny test a (a)
14 Nasty test ()
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.