I'm trying to remove bothersome and unnecessary characters from my dataset. A portion of my code:
DATA regions;
SET regionsc;
SLOE_1_Reg = TRANWRD (SLOE_1_Reg,'Midwest (IA, IL, IN, OH, KS, MI, MN, MO, ND, SD, WI)','Midwest');
RUN;
...This returns the identical column with which I started. There's no error message in the log. What am I doing wrong?
Thanks-
A simple example, assuming you want to remove all the states using TRANWRD
I'm assuming you have the value I set in haveStr, and want to convert it to the value I have in wantStr.
I suggest you provide a sample of the data you are attempting to transform, what you want as a result, what code you currently have and the log from that code
1456 data _null_ ;
1457 haveStr="Midwest (IA, IL, IN, OH, KS, MI, MN, MO, ND, SD, WI)" ;
1458 wantStr="Midwest" ;
1459
1460 convert=tranwrd(haveStr,"Midwest (IA, IL, IN, OH, KS, MI, MN, MO, ND, SD,
1460! WI)","MidWest") ;
1461
1462 put haveStr= ;
1463 put wantStr= ;
1464 put convert= ;
1465 run ;
haveStr=Midwest (IA, IL, IN, OH, KS, MI, MN, MO, ND, SD, WI)
wantStr=Midwest
convert=MidWest
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Admin Note:
Marking this as the Accepted Solution, as it has a link to the TRANWRD documentation
Also, want to call out replies by:
@PaigeMiller which notes the importance of knowing your data:
The spaces and commas are critical if you are going to make TRANWRD work
@FreelanceReinh which offers a method to convert values that are "messy" inconsistent:
SLOE_1_Reg = prxchange('s/(\w+)\s*\(.+?\)/$1/',-1,SLOE_1_Reg);
Basically, this would replace substrings of the form "word (unwanted details)" with "word".
We need for you to provide a portion of your data in order to figure this out. Please provide the data as working SAS data step code, which you can type in yourself, or follow these instructions. Please do not provide data as Excel files or as screen captures or any other format.
Ok, this is an imported Excel file but I've put a few datalines together to give you an idea:
DATALINES;
gender Med_Type SLOE_1_Reg
M Private Midwest(IA, IL, IN, OH, KS, MI, MN, MO, ND, SD, WI)
F Public Northeast(CT, District of Columbia, DE, MA, MD, ME, NH, PA, NY, NJ, RI, VT)
F Private Southeast(AL, AR, FL, CA, KY, LA, PR, MS, NC, SC, TN, VA, WV)
M Public West(AK, AZ, CA, CO, ID, HI, MT, NM, NV, OK, OR, TX, UT, WA, WY)
;
You are correct, I'm just trying to get those overly-descriptive cells down to the basic single word which describes the region.
Thank you all for your time and efforts!
@newtriks wrote:
Ok, this is an imported Excel file but I've put a few datalines together to give you an idea:
DATALINES;
gender Med_Type SLOE_1_Reg
M Private Midwest(IA, IL, IN, OH, KS, MI, MN, MO, ND, SD, WI)
F Public Northeast(CT, District of Columbia, DE, MA, MD, ME, NH, PA, NY, NJ, RI, VT)
F Private Southeast(AL, AR, FL, CA, KY, LA, PR, MS, NC, SC, TN, VA, WV)
M Public West(AK, AZ, CA, CO, ID, HI, MT, NM, NV, OK, OR, TX, UT, WA, WY)
;
You are correct, I'm just trying to get those overly-descriptive cells down to the basic single word which describes the region.
Thank you all for your time and efforts
Those examples look like what you really want is the first word, which the SCAN function will get quite easily.
length region $ 10; region = scan(SLOE_1_Reg ,1,'(');
@newtriks wrote:
Ok, this is an imported Excel file but I've put a few datalines together to give you an idea:
DATALINES;
gender Med_Type SLOE_1_Reg
M Private Midwest(IA, IL, IN, OH, KS, MI, MN, MO, ND, SD, WI)
F Public Northeast(CT, District of Columbia, DE, MA, MD, ME, NH, PA, NY, NJ, RI, VT)
F Private Southeast(AL, AR, FL, CA, KY, LA, PR, MS, NC, SC, TN, VA, WV)
M Public West(AK, AZ, CA, CO, ID, HI, MT, NM, NV, OK, OR, TX, UT, WA, WY)
;
You are correct, I'm just trying to get those overly-descriptive cells down to the basic single word which describes the region.
Thank you all for your time and efforts!
You have typed different text on the Midwest line than what you showed in the original message. The spaces and commas are critical if you are going to make TRANWRD work, and so if we're going to help, you absolutely must provide data in the form I requested, as working SAS data step code, preferably using the instructions at the link I gave. That's absolutely MUST provide data in the form I requested. We are trying to help you, but you need to help us.
A simple example, assuming you want to remove all the states using TRANWRD
I'm assuming you have the value I set in haveStr, and want to convert it to the value I have in wantStr.
I suggest you provide a sample of the data you are attempting to transform, what you want as a result, what code you currently have and the log from that code
1456 data _null_ ;
1457 haveStr="Midwest (IA, IL, IN, OH, KS, MI, MN, MO, ND, SD, WI)" ;
1458 wantStr="Midwest" ;
1459
1460 convert=tranwrd(haveStr,"Midwest (IA, IL, IN, OH, KS, MI, MN, MO, ND, SD,
1460! WI)","MidWest") ;
1461
1462 put haveStr= ;
1463 put wantStr= ;
1464 put convert= ;
1465 run ;
haveStr=Midwest (IA, IL, IN, OH, KS, MI, MN, MO, ND, SD, WI)
wantStr=Midwest
convert=MidWest
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Admin Note:
Marking this as the Accepted Solution, as it has a link to the TRANWRD documentation
Also, want to call out replies by:
@PaigeMiller which notes the importance of knowing your data:
The spaces and commas are critical if you are going to make TRANWRD work
@FreelanceReinh which offers a method to convert values that are "messy" inconsistent:
SLOE_1_Reg = prxchange('s/(\w+)\s*\(.+?\)/$1/',-1,SLOE_1_Reg);
Basically, this would replace substrings of the form "word (unwanted details)" with "word".
Thanks for your time and effort!
The version below of what you put together seems to work. The only issue now is that it gives me just one observation. How can I modify it so that it gives me all 500+?
data sloeconvert7 ;
SLOE_1_Reg ="Midwest (IA, IL, IN, OH, KS, MI, MN, MO, ND, SD, WI)" ;
Region="Midwest" ;
Region=tranwrd(SLOE_1_Reg,"Midwest (IA, IL, IN, OH, KS, MI, MN, MO, ND, SD, WI)","Midwest");
PUT SLOE_1_Reg= ;
PUT Region= ;
RUN;
Please provide some sample data so that I can figure out the solutions and the cause.
-Vijay
Hello @newtriks,
If TRANWRD does not replace the substring specified in the second argument, the corresponding substring in variable SLOE_1_Reg must be different. Possible differences include missing blanks, double blanks, other white-space characters (such as 'A0'x or tab characters) instead of blanks, upper/lower case, etc. So, either you need to ensure that the search string matches exactly what you want to replace (use the $HEX. format to determine the exact content of variable SLOE_1_Reg) or use more flexible pattern matching, e.g. with PRXCHANGE.
Example:
SLOE_1_Reg = prxchange('s/(\w+)\s*\(.+?\)/$1/',-1,SLOE_1_Reg);
Basically, this would replace substrings of the form "word (unwanted details)" with "word".
Edit: Corrected typo: The intention was to include the case of no space between "word" and the opening parenthesis by using \s*, not \s+, in the regex.
Thank you all for your interest and help. I ultimately found it easier to fix the columns in Excel and then re-import and merge them.
I appreciate your time.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.