- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a SAS dataset containing addresses like the following example:
addr 1 | addr 2 | addr 3 |
20 kelly dr apt 2 england home | ||
16 hawk lane fl 2 bertha senior living | senior living | |
3 india road | ||
45 virginia street apt 6 england home | ||
holden st unit 3 | unit 3 |
I need all the apt/unit numbers in the 2nd column and any additional level of address in the 3rd column like the following:
Is there any way to do this? Thank you!
addr 1 | addr 2 | addr 3 |
20 kelly dr | apt 2 | england home |
16 hawk lane | fl 2 | bertha senior living |
3 india road | ||
45 virginia street | apt 6 | england home |
holden st | unit 3 |
|
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Seems like a difficult task. The difficult task is to create a comprehensive set of rules that will separate this data the way you want. Once you have those rules, the coding isn't that hard.
For example, are the first three "words" always the address 1? What if someone lives at 523 West Main Street? That's four "words" for address 1. Is that possible? If so, how would you know where the address 1 ends?
Since you know the data, and we do not, you need to create the rules to do this and then inform us of those rules.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How did all of the address data end up in ADDR1 to begin with? Is that how it really is in the data source or what happened after importing into SAS? If it is the latter then fix the import. As @PaigeMiller says, address parsing is hard. There is specialist functionality in SAS Data Quality to handle this but I doubt you would have this product.available to you. In any case these don't look like standard street addresses.
How many addresses do you have? If it is only a few thousand then a custom solution might work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you'd better post your real data to explain your question.
data have;
input have $80.;
cards;
20 kelly dr apt 2 england home
16 hawk lane fl 2 bertha senior living
3 india road
45 virginia street apt 6 england home
holden st unit 3
;
data want;
set have;
p=prxmatch('/\w+\s+\d+/io',have);
if p then do;
addr1=substr(have,1,p-1);
temp=substr(have,p);
call scan(temp,2,p2,l2,' ');
addr2=substr(temp,1,p2+l2);
addr3=substr(temp,p2+l2);
end;
else addr1=have;
keep addr1 addr2 addr3;
run;