SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
adventuresinsas
Fluorite | Level 6

Hello SAS community!

 

I'm kind of stuck here so would appreciate any and all advice. I've got a dataset with a list of building locations including ID, address, city, state, and zip. Is there a way to pull out only rows where it's the same address but different city, or rows with the same zip but different state? These are data quality errors that I'd like to identify. 

 

Thanks!

 

3 REPLIES 3
SASJedi
Ammonite | Level 13

You COULD try a cartesian join in SQL with an appropriate WHERE clause.

data have;
   infile datalines dsd;
   ID+1;
   informat address $25.  city $10. state $2. zip;
   input address  city  state  zip;
datalines;
550 Jack Horner, Rhymes, NJ, 06430
123 My Road, Omaha,  NE, 23143
123 My Road, Auburn, NE, 23143
123 My Road, Omaha,  NE, 23144
123 My Road, Auburn, NE, 23144
123 My Road, Auburn, ME, 04210
311 Main St, Big Sur, CA, 90201
;

proc sql;
title "Complete table";
select * 
   from have
;
title "Problem Rows";
select * 
   from have as a
      , have as b
   where    a.zip=b.zip 
           and a.address=b.address 
           and   a.State=b.State 
           and   a.City <> b.City
;
quit;
title;
Check out my Jedi SAS Tricks for SAS Users
ballardw
Super User

If the idea is to check for data quality you might consider some of the ZIP related functions.

Such as ZIPSTATE, ZIPNAME or ZIPNAMEL (depending on the way you have state in the data).

If you have 2 letter state codes you could do something like:

Data zipstateprob;
   set have;
   if zipname(zipcode) ne Statecode then output;
end;

to get a list where the zip appears to disagree with the state.

There is also a ZIPCITY that returns the name of a city and the 2 letter state code.

These do find "mismatches" for some areas because of local spelliing, like NY, NY or ABQ, NM or CdA, ID

but may be helpful.

 

I'm not sure I would start with "same address, different city" as there are just too many streets with common numbers. Quick, how far are you from a street with a name like Washington, Jefferson, Franklin, Lincoln, or other president or state names? Not to mention "Main St" or "Frontage Rd".

Ksharp
Super User
Code Not Tested:

proc sql;
create table want as
select * from have
group by address
having count(distinct city) > 1;
quit;

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1445 views
  • 1 like
  • 4 in conversation