I have a dataset that contains items such as "Oklahoma" "0klahoma" "Okla#oma" "555-444-3333" and I only want to keep entries that contain purely letters - dropping all entries that contain anything other than a letter. How do you do it?
NOTALPHA() function.
if notalpha(your_variable_name) then delete;
@rob_roche wrote:
I have a dataset that contains items such as "Oklahoma" "0klahoma" "Okla#oma" "555-444-3333" and I only want to keep entries that contain purely letters - dropping all entries that contain anything other than a letter. How do you do it?
data have;
length string $20;
input string $;
datalines;
Oklahoma
0klahoma
Okla#oma
555-444-3333
;
data want;
set have;
if anypunct(string)=0 & anydigit(string)=0;
run;
This seems to have been the most effective - thanks!
I agree with @Astounding. Also, I think @Reezas solution is the way to go 🙂
It would be safer if you would specify some of the details. Since the objective is to delete observations, it's a good idea to spell out the rules before deletion takes place.
Is a comma always invalid, or could this appear: Washington, DC
Is a hyphen always invalid, or could this appear: Raleigh-Durham
Would a zip code be valid?
Do you have a small list of valid values?
Does capitalization matter?
sashelp and sasmaps libraries have several of these datasets readily available for you.
@rob_roche wrote:
These are great questions - what I am trying to do is create a list of US city names, this will become my reference list. I'm trying to bump international names out of a master list of 55 million records. I'm trying to create a table that has a list of pairs of city, county, zip. FORTUNTELY, I found a .txt file online containing this information. However, I was frustrated I couldn't figure out how to clean my data myself - hence the question.
That's also great to know - this is week 3 of using SAS 🙂
Thank you for the guidance.
@rob_roche wrote:
These are great questions - what I am trying to do is create a list of US city names, this will become my reference list. I'm trying to bump international names out of a master list of 55 million records. I'm trying to create a table that has a list of pairs of city, county, zip. FORTUNTELY, I found a .txt file online containing this information. However, I was frustrated I couldn't figure out how to clean my data myself - hence the question.
You may want to look and see if you have a data set SASHELP.Zipcode. It should be part of your install. It has US city names (and some alternate names) state codes, zip codes, county name and FIPS number area codes and other information.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.