Help using Base SAS procedures

Cleaning data to removed non-letter entries.

Reply
New Contributor
Posts: 4

Cleaning data to removed non-letter entries.

I have a dataset that contains items such as "Oklahoma" "0klahoma" "Okla#oma" "555-444-3333" and I only want to keep entries that contain purely letters - dropping all entries that contain anything other than a letter. How do you do it?

Super User
Posts: 23,771

Re: Cleaning data to removed non-letter entries.

Posted in reply to rob_roche

NOTALPHA() function.

http://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=p15hz07s8zvsywn1q59607pduunb.htm&...

 

if notalpha(your_variable_name) then delete;

@rob_roche wrote:

I have a dataset that contains items such as "Oklahoma" "0klahoma" "Okla#oma" "555-444-3333" and I only want to keep entries that contain purely letters - dropping all entries that contain anything other than a letter. How do you do it?


 

 

PROC Star
Posts: 1,283

Re: Cleaning data to removed non-letter entries.

Posted in reply to rob_roche
data have;
length string $20;
input string $;
datalines;
Oklahoma
0klahoma
Okla#oma
555-444-3333
;

data want;
   set have;
   if anypunct(string)=0 & anydigit(string)=0;
run;
New Contributor
Posts: 4

Re: Cleaning data to removed non-letter entries.

This seems to have been the most effective - thanks!

PROC Star
Posts: 1,283

Re: Cleaning data to removed non-letter entries.

Posted in reply to rob_roche

I agree with @Astounding. Also, I think @Reezas solution is the way to go Smiley Happy

Super User
Posts: 6,785

Re: Cleaning data to removed non-letter entries.

Posted in reply to rob_roche

It would be safer if you would specify some of the details.  Since the objective is to delete observations, it's a good idea to spell out the rules before deletion takes place.

 

Is a comma always invalid, or could this appear:  Washington, DC

 

Is a hyphen always invalid, or could this appear:  Raleigh-Durham

 

Would a zip code be valid?

 

Do you have a small list of valid values?

 

Does capitalization matter?

 

 

New Contributor
Posts: 4

Re: Cleaning data to removed non-letter entries.

Posted in reply to Astounding
These are great questions - what I am trying to do is create a list of US city names, this will become my reference list. I'm trying to bump international names out of a master list of 55 million records. I'm trying to create a table that has a list of pairs of city, county, zip. FORTUNTELY, I found a .txt file online containing this information. However, I was frustrated I couldn't figure out how to clean my data myself - hence the question.
Super User
Posts: 23,771

Re: Cleaning data to removed non-letter entries.

Posted in reply to rob_roche

sashelp and sasmaps libraries have several of these datasets readily available for you. 

 


@rob_roche wrote:
These are great questions - what I am trying to do is create a list of US city names, this will become my reference list. I'm trying to bump international names out of a master list of 55 million records. I'm trying to create a table that has a list of pairs of city, county, zip. FORTUNTELY, I found a .txt file online containing this information. However, I was frustrated I couldn't figure out how to clean my data myself - hence the question.

 

New Contributor
Posts: 4

Re: Cleaning data to removed non-letter entries.

That's also great to know - this is week 3 of using SAS Smiley Happy

 

Thank you for the guidance. 

 

Super User
Posts: 13,583

Re: Cleaning data to removed non-letter entries.

Posted in reply to rob_roche

@rob_roche wrote:
These are great questions - what I am trying to do is create a list of US city names, this will become my reference list. I'm trying to bump international names out of a master list of 55 million records. I'm trying to create a table that has a list of pairs of city, county, zip. FORTUNTELY, I found a .txt file online containing this information. However, I was frustrated I couldn't figure out how to clean my data myself - hence the question.

You may want to look and see if you have a data set SASHELP.Zipcode. It should be part of your install. It has US city names (and some alternate names) state codes, zip codes, county name and FIPS number area codes and other information.

Ask a Question
Discussion stats
  • 9 replies
  • 153 views
  • 3 likes
  • 5 in conversation