Hello,
I have an address field that I am trying to clean. Would want to know how do I remove 0s from the string.
eg; I have addresses like
00000 MARTIN LUTHER KING JR BLVD
00000 MEMPHIS AVE
and some are like
0MAIN STREET
How do I make them
MARTIN LUTHER KING JR BLVD
MEMPHIS AVE
MAIN STREET
Or rather:
data have;
input address:&$200.;
newaddress=prxchange('s/^0+\s*//',1,left(address));
cards;
00000 MARTIN LUTHER KING JR BLVD
00000 MEMPHIS AVE
0MAIN STREET
20 MAIN STREET
;
SAS BASE
data have;
input address:&$200.;
newaddress=strip(compress(address,'0'));
cards;
00000 MARTIN LUTHER KING JR BLVD
00000 MEMPHIS AVE
0MAIN STREET
;
That won't work, because there will be addresses that are 21040 Martin Luther King Jr Blvd. It would also erase the 0. SAS Data flux has an address cleaning component.
There's a paper on lexjansen.com that parses addresses and cleans them, for US at least that may work here though.
https://analytics.ncsu.edu/sesug/2008/CC-028.pdf
Edit: what I would do is run that program which separates the addresses, and then search the house number field to see if that is all 0's instead.
Thank you Reeza. Very informative paper.
Thank you Jagdish. But I have addresses like 1023 Main Street or 420 Main Street. I do not want to compress those 0s. The position of the 0 is the start of the string. In some cases there are multiple 0s, in some cases there is one 0 followed by the letters without space.
another more efficient way to remove the '0' which are only in the beginning of the address can be done by perl regular expression
data have;
input address:&$200.;
newaddress=strip(prxchange('s/^\d[0]*//',-1,strip(address)));
cards;
00000 MARTIN LUTHER KING JR BLVD
00000 MEMPHIS AVE
0MAIN STREET
;
Or rather:
data have;
input address:&$200.;
newaddress=prxchange('s/^0+\s*//',1,left(address));
cards;
00000 MARTIN LUTHER KING JR BLVD
00000 MEMPHIS AVE
0MAIN STREET
20 MAIN STREET
;
Cleaned a lot of addresses. Thank you.
This does not quite answer my problem. I have addresses like these
3235 Cambridge Avenue, #6H, Bronx, New York 10463
2 Harrison Street, PH, Poughkeepsie, New York 12601
130 Schroeders Avenue, Apt. 12G, Brooklyn, New York 11239
220 Wadsworth Avenue, Apt. 307, New York, New York 10033
I want to break them apart. For the first one, I want 3235 in one variable, Cambridge Avenue in the second variable, #6H in the third variable, Bronx in the fourth variable, New York in the fifth variable and 10463 in the sixth variable.
Please post your problem as a new topic.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.