- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a question on how to remove only apostrophes from proper names on a row-by-row basis. There maybe more than one apostrophe that must be removed.
For example I have "place of employment" field and the entries in this field are names of businesses. But the same business names are not entered consistently so that
FREDS PHARMACY
FRED'S PHARMACY
OREGON CITY FRED MEYER
OREGON CITY'S FRED MEYER
OREGON CITY FRED MEYER'S
JIMS GLADSTONE SUBARU CARS AND TRUCKS
JIM'S GLADSTONE SUBARU CAR'S AND TRUCK'S
and so on.
what I want is the apostrophes removed so the names of businesses above look like
FREDS PHARMACY
FREDS PHARMACY
OREGON CITY FRED MEYER
OREGON CITYS FRED MEYER
OREGON CITY FRED MEYERS
JIMS GLADSTONE SUBARU CARS AND TRUCKS
JIMS GLADSTONE SUBARU CARS AND TRUCKS
I don't care about duplicates since each row is a unique contact.
I have slightly over 30,000 non-duplicated rows to check and then remove any apostrophes.
Thank you for your help.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You put the cart before the horse.
data SASCDC_2.Arias_NAICS_Classify_H;
set SASCDC_2.Arias_NAICS_Classify_H;
P_O_E = COMPRESS(Place_of_employment, "'");
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use COMPRESS() to remove all apostrophes, but removing it from proper names isn't an easy task.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The compress function will remove characters.
data example; x="JIM'S GLADSTONE SUBARU CAR'S AND TRUCK'S"; y=compress(x,"'"); run;
I use the double quotes around the character to remove in compress for legibility and create a new variable so you can compare results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I applied the compress code.
I received the following result in the log
7121 Data SASCDC_2.Arias_NAICS_Classify_H; 7122 *Retain Contact_Person_ID Place_of_Employment NAICS_Sector Sector_Type Type_Firm 7122! Other_Categories; 7123 P_O_E = COMPRESS(Place_of_employment, "'"); 7124 Set SASCDC_2.Arias_NAICS_Classify_H; ERROR: Variable Place_of_employment has been defined as both character and numeric. 7125 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 7123:21 NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set SASCDC_2.ARIAS_NAICS_CLASSIFY_H may be incomplete. When this step was stopped there were 0 observations and 8 variables. WARNING: Data set SASCDC_2.ARIAS_NAICS_CLASSIFY_H was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
Place_of_employment is a character variable. But does the compress function convert it to numeric?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You put the cart before the horse.
data SASCDC_2.Arias_NAICS_Classify_H;
set SASCDC_2.Arias_NAICS_Classify_H;
P_O_E = COMPRESS(Place_of_employment, "'");
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the help.
I tend to put the cart before the horse quite often!