BookmarkSubscribeRSS Feed
JithinJoe
Fluorite | Level 6

Hello Everyone, 

 

I have a dataset which contain 2 variable , Locality and Customer address, Due to some reasons, Some values for 'locality' in the dataset doesn't match with customer address.

 

For example in my dataset I have an observation where Locality mentioned as Borough Park ( NY) But the actual Customer location is Kensington (NY).

 

I need to read the data in Customer address and change the value for locality if it doesn't match with Locality.

 

 How can I do it in sas programming? 

 

 

6 REPLIES 6
Reeza
Super User
Well, you'd likely have to show some more examples of what you're trying to do before we can help you. For example, how would you know, looking at a record that Borough Park (NY) is wrong?
JithinJoe
Fluorite | Level 6

@Reeza In my project, I need to find the Locality wise sales of a Cloud kitchen, The dataset is directly from Cloud kitchen's Point of sales. People across 6 km radius can order food online from this restaurant. But for some unknown reason we are getting wrong locality in the variable. But Customer address is correct. This is provided by the customers themself.  So if I want to find area wise sales, few observation need to be cleaned according to the correct Location.  

 

Reeza
Super User
Ok, but that doesn't answer the question, how do you know a locality is wrong? You looking at it can know, but how else is is known that it's incorrect? Is there a list of valid values? Is it distance based on KM? What happens if a person moves in the middle of making an order, ie I sometimes order delivery on my bus ride home.

ShiroAmada
Lapis Lazuli | Level 10

Try this...

 

data have;
  infile datalines dlm="|";
  input
  locality: $50.
  customer: $50.
  ;
datalines;
Borough Park ( NY)| Kensington (NY)
Borough Park (_NY)|Borough Park (NY)
Borough Park (NY)|Borough Park (_NY)
Bronx ( NY)|Borough Park (NYC)
;
run;

data WANT;
  set HAVE;
  locality=ifc(locality <> customer,locality, locality);
run;
Kurt_Bremser
Super User

@Reeza has a very good argument here.

If locality is the spot from where an order is placed, it should not be changed at all, but kept for further analysis, as it is a very useful datum.

Imagine a mother, away on a job, ordering food for her kids. She might very well do that online (or per phone call) from San Francisco. Such facts can tell you a lot about your customers.

johnsville
Obsidian | Level 7

Without having some kind of well-known rule for the contents of the address variable, this is nearly impossible to accomplish.  Parsing addresses from scratch is more of an art than a science.  E.g., how to know when the city is two or more words?  You need data from outside the application to inform the effort.  This is why address correction software is as complex as it is.

 

Also, it is very possible that, for some localities, there are multiple valid city/town names - in such cases it not that the provided locality is wrong, rather that the address correction software used by the client (if they actually do use such) has returned a different but equally valid city name and used that for the locality field they provide. It is also possible that a zip code straddle state lines, so be careful.

 

However, to 'fix' the locality to have the exact same value as contained in the address, the address parsing needed can become quite complex. One would likely need to locate and separate out the city, state and zip code (US addresses).  Then use a zip/city table to see the known cities/towns for the zip.  (SAS actually provides a zip table.)  When one of those city/town names is contained within the original address, copy it from the zip table into the (now updated) locality field.  This assumes the zip code is correct.

 

But you still have the problem that both the address and the locality are wrong, but do not match each other.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1235 views
  • 0 likes
  • 5 in conversation