I have a question on how to remove only apostrophes from proper names on a row-by-row basis. There maybe more than one apostrophe that must be removed.
For example I have "place of employment" field and the entries in this field are names of businesses. But the same business names are not entered consistently so that
FREDS PHARMACY
FRED'S PHARMACY
OREGON CITY FRED MEYER
OREGON CITY'S FRED MEYER
OREGON CITY FRED MEYER'S
JIMS GLADSTONE SUBARU CARS AND TRUCKS
JIM'S GLADSTONE SUBARU CAR'S AND TRUCK'S
and so on.
what I want is the apostrophes removed so the names of businesses above look like
FREDS PHARMACY
FREDS PHARMACY
OREGON CITY FRED MEYER
OREGON CITYS FRED MEYER
OREGON CITY FRED MEYERS
JIMS GLADSTONE SUBARU CARS AND TRUCKS
JIMS GLADSTONE SUBARU CARS AND TRUCKS
I don't care about duplicates since each row is a unique contact.
I have slightly over 30,000 non-duplicated rows to check and then remove any apostrophes.
Thank you for your help.
You put the cart before the horse.
data SASCDC_2.Arias_NAICS_Classify_H;
set SASCDC_2.Arias_NAICS_Classify_H;
P_O_E = COMPRESS(Place_of_employment, "'");
run;
The compress function will remove characters.
data example; x="JIM'S GLADSTONE SUBARU CAR'S AND TRUCK'S"; y=compress(x,"'"); run;
I use the double quotes around the character to remove in compress for legibility and create a new variable so you can compare results.
I applied the compress code.
I received the following result in the log
7121 Data SASCDC_2.Arias_NAICS_Classify_H; 7122 *Retain Contact_Person_ID Place_of_Employment NAICS_Sector Sector_Type Type_Firm 7122! Other_Categories; 7123 P_O_E = COMPRESS(Place_of_employment, "'"); 7124 Set SASCDC_2.Arias_NAICS_Classify_H; ERROR: Variable Place_of_employment has been defined as both character and numeric. 7125 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 7123:21 NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set SASCDC_2.ARIAS_NAICS_CLASSIFY_H may be incomplete. When this step was stopped there were 0 observations and 8 variables. WARNING: Data set SASCDC_2.ARIAS_NAICS_CLASSIFY_H was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
Place_of_employment is a character variable. But does the compress function convert it to numeric?
You put the cart before the horse.
data SASCDC_2.Arias_NAICS_Classify_H;
set SASCDC_2.Arias_NAICS_Classify_H;
P_O_E = COMPRESS(Place_of_employment, "'");
run;
Thank you for the help.
I tend to put the cart before the horse quite often!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: