BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ARTI1
Calcite | Level 5

Hello,

I have an address field that I am trying to clean. Would want to know how do I remove 0s from the string.

eg; I have addresses like 

00000 MARTIN LUTHER KING JR BLVD

00000 MEMPHIS AVE

and some are like

0MAIN STREET

 

How do I make them 

MARTIN LUTHER KING JR BLVD

MEMPHIS AVE

MAIN STREET

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Or rather:

 

data have;
input address:&$200.;
newaddress=prxchange('s/^0+\s*//',1,left(address));
cards;
00000 MARTIN LUTHER KING JR BLVD
00000 MEMPHIS AVE
0MAIN STREET
20 MAIN STREET
;
PG

View solution in original post

11 REPLIES 11
Reeza
Super User
Are you using SAS Base or do you have access to SAS Data Management Studio/Dataflux?
ARTI1
Calcite | Level 5

SAS BASE

Jagadishkatam
Amethyst | Level 16
data have;
input address:&$200.;
newaddress=strip(compress(address,'0'));
cards;
00000 MARTIN LUTHER KING JR BLVD
00000 MEMPHIS AVE
0MAIN STREET
;
Thanks,
Jag
Reeza
Super User

That won't work, because there will be addresses that are 21040 Martin Luther King Jr Blvd. It would also erase the 0. SAS Data flux has an address cleaning component.

There's a paper on lexjansen.com that parses addresses and cleans them, for US at least that may work here though.

https://analytics.ncsu.edu/sesug/2008/CC-028.pdf

 

Edit: what I would do is run that program which separates the addresses, and then search the house number field to see if that is all 0's instead.

ARTI1
Calcite | Level 5

Thank you Reeza. Very informative paper.

ARTI1
Calcite | Level 5

Thank you Jagdish. But I have addresses like 1023 Main Street or 420 Main Street. I do not want to compress those 0s. The position of the 0 is the start of the string. In some cases there are multiple 0s, in some cases there is one 0 followed by the letters without space.

Jagadishkatam
Amethyst | Level 16

another more efficient way to remove the '0' which are only in the beginning of the address can be done by perl regular expression

 

data have;
input address:&$200.;
newaddress=strip(prxchange('s/^\d[0]*//',-1,strip(address)));
cards;
00000 MARTIN LUTHER KING JR BLVD
00000 MEMPHIS AVE
0MAIN STREET
;
Thanks,
Jag
PGStats
Opal | Level 21

Or rather:

 

data have;
input address:&$200.;
newaddress=prxchange('s/^0+\s*//',1,left(address));
cards;
00000 MARTIN LUTHER KING JR BLVD
00000 MEMPHIS AVE
0MAIN STREET
20 MAIN STREET
;
PG
ARTI1
Calcite | Level 5

Cleaned a lot of addresses. Thank you.

sekarpc
Calcite | Level 5

This does not quite answer my problem. I have addresses like these

3235 Cambridge Avenue, #6H, Bronx, New York 10463
2 Harrison Street, PH, Poughkeepsie, New York 12601
130 Schroeders Avenue, Apt. 12G, Brooklyn, New York 11239
220 Wadsworth Avenue, Apt. 307, New York, New York 10033

I want to break them apart. For the first one, I want 3235 in one variable, Cambridge Avenue in the second variable, #6H in the third variable, Bronx in the fourth variable, New York in the fifth variable and 10463 in the sixth variable.

PGStats
Opal | Level 21

Please post your problem as a new topic.

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2563 views
  • 0 likes
  • 5 in conversation