BookmarkSubscribeRSS Feed
adventuresinsas
Fluorite | Level 6

Hello SAS community!

 

I'm kind of stuck here so would appreciate any and all advice. I've got a dataset with a list of building locations including ID, address, city, state, and zip. Is there a way to pull out only rows where it's the same address but different city, or rows with the same zip but different state? These are data quality errors that I'd like to identify. 

 

Thanks!

 

3 REPLIES 3
SASJedi
SAS Super FREQ

You COULD try a cartesian join in SQL with an appropriate WHERE clause.

data have;
   infile datalines dsd;
   ID+1;
   informat address $25.  city $10. state $2. zip;
   input address  city  state  zip;
datalines;
550 Jack Horner, Rhymes, NJ, 06430
123 My Road, Omaha,  NE, 23143
123 My Road, Auburn, NE, 23143
123 My Road, Omaha,  NE, 23144
123 My Road, Auburn, NE, 23144
123 My Road, Auburn, ME, 04210
311 Main St, Big Sur, CA, 90201
;

proc sql;
title "Complete table";
select * 
   from have
;
title "Problem Rows";
select * 
   from have as a
      , have as b
   where    a.zip=b.zip 
           and a.address=b.address 
           and   a.State=b.State 
           and   a.City <> b.City
;
quit;
title;
Check out my Jedi SAS Tricks for SAS Users
ballardw
Super User

If the idea is to check for data quality you might consider some of the ZIP related functions.

Such as ZIPSTATE, ZIPNAME or ZIPNAMEL (depending on the way you have state in the data).

If you have 2 letter state codes you could do something like:

Data zipstateprob;
   set have;
   if zipname(zipcode) ne Statecode then output;
end;

to get a list where the zip appears to disagree with the state.

There is also a ZIPCITY that returns the name of a city and the 2 letter state code.

These do find "mismatches" for some areas because of local spelliing, like NY, NY or ABQ, NM or CdA, ID

but may be helpful.

 

I'm not sure I would start with "same address, different city" as there are just too many streets with common numbers. Quick, how far are you from a street with a name like Washington, Jefferson, Franklin, Lincoln, or other president or state names? Not to mention "Main St" or "Frontage Rd".

Ksharp
Super User
Code Not Tested:

proc sql;
create table want as
select * from have
group by address
having count(distinct city) > 1;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 593 views
  • 1 like
  • 4 in conversation