SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Extracting Street, House Number and Door Number

Reply
Contributor
Posts: 23

Extracting Street, House Number and Door Number

I need somebody who can help me clean my "Contracts" table.

My approach can be seen in the data job (attachment)

 

The biggest probem is with extracting the correct HOUSE_NUMBER and DOOR_NUMBER fields (which are "melted" into STREET field)

 

For example Housenumber can have values such as:

  • Top112
  • 21/121/4
  • Block 23
  • Room 443
  • Weg6
  • Top4-
  • E32
  • Top 21 3 OG
  • Top 212/
  • Terminal 2
  • Object 21
  • Street 32g (

 

Out of 150.000 distinct contracts around 50.000 have unmatchable House Numbers.

 

How can I correctly parse the fields to extract:

  • Street
  • House Number
  • Door Number

???

 

Attachment
Super User
Super User
Posts: 7,976

Re: Extracting Street, House Number and Door Number

Without know what each of the possible combinations are and what they mean, I am not sure you can.  You could drop all non-numeric data: compress(<var>,' ','kd')

But this probably wont help, I mean what is: 21/121/4 and how does it compare to Top 21 3 G for instance, there doesn't seem to be anything logical in that data at all.

Valued Guide
Posts: 505

Re: Extracting Street, House Number and Door Number

Suppose you have a single address string                                                                                                                                                                                                               
                                                                                                                                                                                                                                                       
HAVE                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                       
ENROLLMENT GROUP 39 E COCONUT ST 3RD FL                                                                                                                                                                                                                
                                                                                                                                                                                                                                                       
WANT                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                       
[                                                                                                                                                                                                                                                      
(u'ENROLLMENT', 'Recipient'),                                                                                                                                                                                                                          
(u'GROUP', 'Recipient'),                                                                                                                                                                                                                               
(u'39', 'AddressNumber'),                                                                                                                                                                                                                              
(u'E', 'StreetNamePreDirectional'),                                                                                                                                                                                                                    
(u'COCONUT', 'StreetName'),                                                                                                                                                                                                                            
(u'ST', 'StreetNamePostType'),                                                                                                                                                                                                                         
(u'3RD', 'OccupancyIdentifier'),                                                                                                                                                                                                                       
(u'FL\n', 'OccupancyType')                                                                                                                                                                                                                             
]                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                       
* the Python USADDRESS package can do much more than                                                                                                                                                                                                   
this. But lets see if it can parse the 6 permutations                                                                                                                                                                                                  
of the same address below. I have already used PERL                                                                                                                                                                                                    
to do some translations , ie East to E, Street to ST, THIRD to 3rd                                                                                                                                                                                     
floor to FL.                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                       
* create input for Python;                                                                                                                                                                                                                             
data adr_badadr;                                                                                                                                                                                                                                       
file "d:/txt/KeyBadAdr.txt";                                                                                                                                                                                                                           
length keybadadr $100.;                                                                                                                                                                                                                                
input;                                                                                                                                                                                                                                                 
key=put(10000000000+_n_,z11.);                                                                                                                                                                                                                         
KeyBadAdr=cats(key,_infile_);                                                                                                                                                                                                                          
putlog KeyBadAdr;                                                                                                                                                                                                                                      
put KeyBadAdr;                                                                                                                                                                                                                                         
cards4;                                                                                                                                                                                                                                                
ENROLLMENT GROUP 39 E COCONUT ST 3RD FL                                                                                                                                                                                                                
ENROLLMENT GROUP 3RD FL 39 E COCONUT ST                                                                                                                                                                                                                
39 E COCONUT ST ENROLLMENT GROUP 3RD FL                                                                                                                                                                                                                
39 E COCONUT ST 3RD FL ENROLLMENT GROUP                                                                                                                                                                                                                
3RD FL ENROLLMENT GROUP 39 E COCONUT ST                                                                                                                                                                                                                
3RD FL 39 E COCONUT ST ENROLLMENT GROUP                                                                                                                                                                                                                
;;;;                                                                                                                                                                                                                                                   
;run;quit;                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                       
/* in the log                                                                                                                                                                                                                                          
10000000001ENROLLMENT GROUP 39 E COCONUT ST 3RD FL                                                                                                                                                                                                     
10000000002ENROLLMENT GROUP 3RD FL 39 E COCONUT ST                                                                                                                                                                                                     
1000000000339 E COCONUT ST ENROLLMENT GROUP 3RD FL                                                                                                                                                                                                     
1000000000439 E COCONUT ST 3RD FL ENROLLMENT GROUP                                                                                                                                                                                                     
100000000053RD FL ENROLLMENT GROUP 39 E COCONUT ST                                                                                                                                                                                                     
100000000063RD FL 39 E COCONUT ST ENROLLMENT GROUP                                                                                                                                                                                                     
*/                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                       
* It looks like you will have to do a lot of manual                                                                                                                                                                                                    
  cleaning before you can use open source packages                                                                                                                                                                                                     
  to clean stadardize your addresses.                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                       
* the most critical address parts for a strong matchcode are                                                                                                                                                                                           
  street number                                                                                                                                                                                                                                        
  street name                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                       
  Note it is not really possible to know if                                                                                                                                                                                                            
  'Enrollment' is a recipient or placename                                                                                                                                                                                                           
  Group could be a state or locality like UTAH,                                                                                                                                                                                                        
  depending on its position.                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                       
  Also it is not always possible to get the                                                                                                                                                                                                            
  occupancy parts. It depends on the location of the occupancy parts.                                                                                                                                                                                           
                                                                                                                                                                                                                                                       
  You can really mess up a mailing with assumptions                                                                                                                                                                                                    
  on non street parts.                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                       
  I am going to build a matchcode based on                                                                                                                                                                                                             
  First character of first name, last name, street number, street name                                                                                                                                                                                 
  and zipcode.                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                       
  Python USADDRESS does an excellent job of parsing the                                                                                                                                                                                                
  the key elements for a match code, ie Street Number and Street Name.
It also handles components like PO BOX. * here is the simple python code that takes your input and creates the output above. %utl_submit_py64(%nrbquote( fo = open('d:/txt/KeyBadAdrFix.txt', 'w'); import usaddress; for line in open('d:/txt/KeyBadAdr.txt'):; . addr=line[11:]; . key=line[0:11]; . res=key + str(usaddress.parse(addr)) + '\n'; . lyn=str(addr); . fo.write(str(res)); )); * good way to look the python output, you need full SAS for this. proc fslist file="d:/txt/KeyBadAdrFix.txt"; run;quit; * here is the parsing of the first address by Python 10000000001[ (u'ENROLLMENT', 'Recipient'), (u'GROUP', 'Recipient'), (u'39', 'AddressNumber'), (u'E', 'StreetNamePreDirectional'), (u'COCONUT', 'StreetName'), (u'ST', 'StreetNamePostType'), (u'3RD', 'OccupancyIdentifier'), (u'FL\n', 'OccupancyType')] Here is the last record 10000000006[ (u'3RD', 'OccupancyIdentifier'), (u'FL', 'OccupancyType'), (u'39', 'AddressNumber'), (u'E', 'StreetNamePreDirectional'), (u'COCONUT', 'StreetName'), (u'ST', 'StreetNamePostType'), (u'ENROLLMENT', 'PlaceName'), (u'GROUP\n', 'StateName') * lets see if all of these addressses have the same street number street name * create name value pairs for further processing; data KeyBadAdrNrm; retain seq 0 adrrec name value; keep seq adrrec name value; length txt $100 key $11 str $700; infile "d:/txt/KeyBadAdrFix.txt" lrecl=1034; input; adrrec=substr(_infile_,1,11); parens=countc(_infile_,')'); str=substr(_infile_,12); do i=1 to parens; txt=substr(scan(str,i,')'),3); txt=compress(txt,"'[()'"); txt=substr(prxchange('s/\\n//',2,txt),2); seq=seq+1; value=scan(txt,1,','); name=scan(txt,2,','); output; end; run;quit; p to 40 obs WORK.KEYBADADRNRM total obs=48 ENROLLMENT GROUP 39 E COCONUT ST 3RD FL bs SEQ ADRREC NAME VALUE 1 1 10000000001 Recipient ENROLLMENT 2 2 10000000001 Recipient GROUP 3 3 10000000001 AddressNumber 39 4 4 10000000001 StreetNamePreDirectional E 5 5 10000000001 StreetName COCONUT 6 6 10000000001 StreetNamePostType ST 7 7 10000000001 OccupancyIdentifier 3RD 8 8 10000000001 OccupancyType FL 9 9 10000000002 Recipient ENROLLMENT 10 10 10000000002 Recipient GROUP 11 11 10000000002 Recipient 3RD 12 12 10000000002 Recipient FL 13 13 10000000002 AddressNumber 39 14 14 10000000002 StreetNamePreDirectional E 15 15 10000000002 StreetName COCONUT 16 16 10000000002 StreetNamePostType ST


If you check you will see all addresses had street number and street name parsed correctly.
Other elements depend on position in string, no way around this. * you need to upcase, remove punctuation and convert to standard USPS abreviations. Perl has some free translations. CAUSEWAY CSWY CAUSWAY CSWY CASWAY CSWY CEN CTR CENT CTR CENTER CTR CENTERS CTRS CENTR CTR CENTRE CTR CIRC CIR CIRCL CIR CIRCLE CIR CIRCLES CIRS ; * invoking python from SAS; %macro utl_submit_py64(pgm)/des="Semi colon separated set of py commands"; * write the program to a temporary file; filename py_pgm "%sysfunc(pathname(work))/py_pgm.py" lrecl=32766 recfm=v; data _null_; length pgm $32755 cmd $255; file py_pgm ; pgm="&pgm"; semi=countc(pgm,';'); do idx=1 to semi; cmd=cats(scan(pgm,idx,';')); if cmd=:'.' then cmd=substr(cmd,2); put cmd $char96.; putlog cmd $char96.; end; run; %let _loc=%sysfunc(pathname(py_pgm)); %put &_loc; filename rut pipe "C:\Python_27_64bit/python.exe &_loc"; data _null_; file print; infile rut; input; put _infile_; run; filename rut clear; filename py_pgm clear; %mend utl_submit_py64;
Contributor
Posts: 23

Re: Extracting Street, House Number and Door Number

Posted in reply to rogerjdeangelis

Hello Roger,

 

First, allow me to say thank you for your reply and the time invested in writing the code.

 

Secondy, I am completely new to SAS, without background in Perl or Python. Threfore, I would really apprecite if you could send me detailed description of how to integrate your code in SAS Data Job (I suppose, that is what needs to be done). Additionally, I would like to mention that I already have QKB, the German version.

 

Regards!

Valued Guide
Posts: 505

Re: Extracting Street, House Number and Door Number

 

Suppose you have a single address string                                                                                                                                                                                                               
                                                                                                                                                                                                                                                       
HAVE                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                       
ENROLLMENT GROUP 39 E COCONUT ST 3RD FL                                                                                                                                                                                                                
                                                                                                                                                                                                                                                       
WANT                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                       
[                                                                                                                                                                                                                                                      
(u'ENROLLMENT', 'Recipient'),                                                                                                                                                                                                                          
(u'GROUP', 'Recipient'),                                                                                                                                                                                                                               
(u'39', 'AddressNumber'),                                                                                                                                                                                                                              
(u'E', 'StreetNamePreDirectional'),                                                                                                                                                                                                                    
(u'COCONUT', 'StreetName'),                                                                                                                                                                                                                            
(u'ST', 'StreetNamePostType'),                                                                                                                                                                                                                         
(u'3RD', 'OccupancyIdentifier'),                                                                                                                                                                                                                       
(u'FL\n', 'OccupancyType')                                                                                                                                                                                                                             
]                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                       
* the Python USADDRESS package can do much more than                                                                                                                                                                                                   
this. But lets see if it can parse the 6 permutations                                                                                                                                                                                                  
of the same address below. I have already used PERL                                                                                                                                                                                                    
to do some translations , ie East to E, Street to ST, THIRD to 3rd                                                                                                                                                                                     
floor to FL.                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                       
* create input for Python;                                                                                                                                                                                                                             
data adr_badadr;                                                                                                                                                                                                                                       
file "d:/txt/KeyBadAdr.txt";                                                                                                                                                                                                                           
length keybadadr $100.;                                                                                                                                                                                                                                
input;                                                                                                                                                                                                                                                 
key=put(10000000000+_n_,z11.);                                                                                                                                                                                                                         
KeyBadAdr=cats(key,_infile_);                                                                                                                                                                                                                          
putlog KeyBadAdr;                                                                                                                                                                                                                                      
put KeyBadAdr;                                                                                                                                                                                                                                         
cards4;                                                                                                                                                                                                                                                
ENROLLMENT GROUP 39 E COCONUT ST 3RD FL                                                                                                                                                                                                                
ENROLLMENT GROUP 3RD FL 39 E COCONUT ST                                                                                                                                                                                                                
39 E COCONUT ST ENROLLMENT GROUP 3RD FL                                                                                                                                                                                                                
39 E COCONUT ST 3RD FL ENROLLMENT GROUP                                                                                                                                                                                                                
3RD FL ENROLLMENT GROUP 39 E COCONUT ST                                                                                                                                                                                                                
3RD FL 39 E COCONUT ST ENROLLMENT GROUP                                                                                                                                                                                                                
;;;;                                                                                                                                                                                                                                                   
;run;quit;                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                       
/* in the log                                                                                                                                                                                                                                          
10000000001ENROLLMENT GROUP 39 E COCONUT ST 3RD FL                                                                                                                                                                                                     
10000000002ENROLLMENT GROUP 3RD FL 39 E COCONUT ST                                                                                                                                                                                                     
1000000000339 E COCONUT ST ENROLLMENT GROUP 3RD FL                                                                                                                                                                                                     
1000000000439 E COCONUT ST 3RD FL ENROLLMENT GROUP                                                                                                                                                                                                     
100000000053RD FL ENROLLMENT GROUP 39 E COCONUT ST                                                                                                                                                                                                     
100000000063RD FL 39 E COCONUT ST ENROLLMENT GROUP                                                                                                                                                                                                     
*/                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                       
* It looks like you will have to do a lot of manual                                                                                                                                                                                                    
  cleaning before you can use open source packages                                                                                                                                                                                                     
  to clean stadardize your addresses.                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                       
* the most critical address parts for a strong matchcode are                                                                                                                                                                                           
  street number                                                                                                                                                                                                                                        
  street name                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                       
  Note it is not really possible to know if                                                                                                                                                                                                            
  'Enrollment' is a recipient or placename                                                                                                                                                                                                           
  Group could be a state or locality like UTAH,                                                                                                                                                                                                        
  depending on its position.                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                       
  Also it is not always possible to get the                                                                                                                                                                                                            
  occupancy parts. It depends on the location of the occupancy parts.                                                                                                                                                                                           
                                                                                                                                                                                                                                                       
  You can really mess up a mailing with assumptions                                                                                                                                                                                                    
  on non street parts.                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                       
  I am going to build a matchcode based on                                                                                                                                                                                                             
  First character of first name, last name, street number, street name                                                                                                                                                                                 
  and zipcode.                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                       
  Python USADDRESS does an excellent job of parsing the                                                                                                                                                                                                
  the key elements for a match code, ie Street Number and Street Name.
It also handles components like PO BOX. * here is the simple python code that takes your input and creates the output above. %utl_submit_py64(%nrbquote( fo = open('d:/txt/KeyBadAdrFix.txt', 'w'); import usaddress; for line in open('d:/txt/KeyBadAdr.txt'):; . addr=line[11:]; . key=line[0:11]; . res=key + str(usaddress.parse(addr)) + '\n'; . lyn=str(addr); . fo.write(str(res)); )); * good way to look the python output, you need full SAS for this. proc fslist file="d:/txt/KeyBadAdrFix.txt"; run;quit; * here is the parsing of the first address by Python 10000000001[ (u'ENROLLMENT', 'Recipient'), (u'GROUP', 'Recipient'), (u'39', 'AddressNumber'), (u'E', 'StreetNamePreDirectional'), (u'COCONUT', 'StreetName'), (u'ST', 'StreetNamePostType'), (u'3RD', 'OccupancyIdentifier'), (u'FL\n', 'OccupancyType')] Here is the last record 10000000006[ (u'3RD', 'OccupancyIdentifier'), (u'FL', 'OccupancyType'), (u'39', 'AddressNumber'), (u'E', 'StreetNamePreDirectional'), (u'COCONUT', 'StreetName'), (u'ST', 'StreetNamePostType'), (u'ENROLLMENT', 'PlaceName'), (u'GROUP\n', 'StateName') * lets see if all of these addressses have the same street number street name * create name value pairs for further processing; data KeyBadAdrNrm; retain seq 0 adrrec name value; keep seq adrrec name value; length txt $100 key $11 str $700; infile "d:/txt/KeyBadAdrFix.txt" lrecl=1034; input; adrrec=substr(_infile_,1,11); parens=countc(_infile_,')'); str=substr(_infile_,12); do i=1 to parens; txt=substr(scan(str,i,')'),3); txt=compress(txt,"'[()'"); txt=substr(prxchange('s/\\n//',2,txt),2); seq=seq+1; value=scan(txt,1,','); name=scan(txt,2,','); output; end; run;quit; p to 40 obs WORK.KEYBADADRNRM total obs=48 ENROLLMENT GROUP 39 E COCONUT ST 3RD FL bs SEQ ADRREC NAME VALUE 1 1 10000000001 Recipient ENROLLMENT 2 2 10000000001 Recipient GROUP 3 3 10000000001 AddressNumber 39 4 4 10000000001 StreetNamePreDirectional E 5 5 10000000001 StreetName COCONUT 6 6 10000000001 StreetNamePostType ST 7 7 10000000001 OccupancyIdentifier 3RD 8 8 10000000001 OccupancyType FL 9 9 10000000002 Recipient ENROLLMENT 10 10 10000000002 Recipient GROUP 11 11 10000000002 Recipient 3RD 12 12 10000000002 Recipient FL 13 13 10000000002 AddressNumber 39 14 14 10000000002 StreetNamePreDirectional E 15 15 10000000002 StreetName COCONUT 16 16 10000000002 StreetNamePostType ST


If you check you will see all addresses had street number and street name parsed correctly.
Other elements depend on position in string, no way around this. * you need to upcase, remove punctuation and convert to standard USPS abreviations. Perl has some free translations. CAUSEWAY CSWY CAUSWAY CSWY CASWAY CSWY CEN CTR CENT CTR CENTER CTR CENTERS CTRS CENTR CTR CENTRE CTR CIRC CIR CIRCL CIR CIRCLE CIR CIRCLES CIRS ; * invoking python from SAS; %macro utl_submit_py64(pgm)/des="Semi colon separated set of py commands"; * write the program to a temporary file; filename py_pgm "%sysfunc(pathname(work))/py_pgm.py" lrecl=32766 recfm=v; data _null_; length pgm $32755 cmd $255; file py_pgm ; pgm="&pgm"; semi=countc(pgm,';'); do idx=1 to semi; cmd=cats(scan(pgm,idx,';')); if cmd=:'.' then cmd=substr(cmd,2); put cmd $char96.; putlog cmd $char96.; end; run; %let _loc=%sysfunc(pathname(py_pgm)); %put &_loc; filename rut pipe "C:\Python_27_64bit/python.exe &_loc"; data _null_; file print; infile rut; input; put _infile_; run; filename rut clear; filename py_pgm clear; %mend utl_submit_py64;

 

Valued Guide
Posts: 505

Re: Extracting Street, House Number and Door Number

Posted in reply to rogerjdeangelis
This code should help you with initial cleanup. 

Just put the location of pythin in the utl_submit_p64

get python
https://www.python.org/downloads/

You do not need to know much Python to exceute the USADDRESS package.

There are excellent eduacational materials on the web.

libname adr "d:/adr"; options fmtsearch=(adr.adr_formats_v1 work.formats); *************************************************************************; data adr.&pgm._unitabv; retain fmtname "$adr_unit2abv"; informat start $12. label $5.; input start label; end=start; cards4; APARTMENT APT BASEMENT BSMT BUILDING BLDG DEPARTMENT DEPT FLOOR FL FRONT FRNT HANGAR HNGR LOBBY LBBY LOT LOT LOWER LOWR OFFICE OFC PENTHOUSE PH PIER PIER REAR REAR ROOM RM SIDE SIDE SLIP SLIP SPACE SPC STOP STOP SUITE STE TRAILER TRLR UNIT UNIT UPPER UPPR ;;;; ;run;quit; data adr.&pgm._abv2stdes; retain fmtname "$adr_abv2stdes"; informat start $7. label $16.; input start label; end=start; cards4; ALY ALLEY ANX ANNEX APT APARTMENT ARC ARCADE AVE AVENUE BSMT BASEMENT BYU BAYOU BCH BEACH BND BEND BLF BLUFF BTM BOTTOM BLVD BOULEVARD BR BRANCH BRG BRIDGE BRK BROOK BLDG BUILDING BG BURG BYP BYPASS CP CAMP CYN CANYON CPE CAPE CSWY CAUSEWAY CTR CENTER CIR CIRCLE CLF CLIFF CLFS CLIFFS CLB CLUB COR CORNER CORS CORNERS CRSE COURSE CT COURT CTS COURTS CV COVE CRK CREEK CRES CRESCENT XING CROSSING DL DALE DM DAM DEPT DEPARTMENT DV DIVIDE DR DRIVE EST ESTATE EXPY EXPRESSWAY EXT EXTENSION FLS FALLS FRY FERRY FLD FIELD FLDS FIELDS FLT FLAT FL FLOOR FRD FORD FRST FOREST FRG FORGE FRK FORK FRKS FORKS FT FORT FWY FREEWAY FRNT FRONT GDN GARDEN GDNS GARDENS GTWY GATEWAY GLN GLEN GRN GREEN GRV GROVE HNGR HANGER HBR HARBOR HVN HAVEN HTS HEIGHTS HWY HIGHWAY HL HILL HLS HILLS HOLW HOLLOW INLT INLET IS ISLAND ISS ISLANDS JCT JUNCTION KY KEY KNL KNOLL KNLS KNOLLS LK LAKE LKS LAKES LNDG LANDING LN LANE LGT LIGHT LF LOAF LBBY LOBBY LCK LOCK LCKS LOCKS LDG LODGE LOWR LOWER MNR MANOR MDW MEADOW MDWS MEADOWS ML MILL MLS MILLS MSN MISSION MT MOUNT MTN MOUNTAIN NCK NECK OFC OFFICE ORCH ORCHARD PKWY PARKWAY PH PENTHOUSE PNE PINE PNES PINES PL PLACE PLN PLAIN PLNS PLAINS PLZ PLAZA PT POINT PRT PORT PR PRAIRIE RADL RADIAL RNCH RANCH RPD RAPID RPDS RAPIDS RST REST RDG RIDGE RIV RIVER RD ROAD RM ROOM SHL SHOAL SHLS SHOALS SHR SHORE SHRS SHORES SPC SPACE SPG SPRING SPGS SPRINGS SQ SQUARE STA STATION STRA STRAVENUE STRM STREAM ST STREET STE SUITE SMT SUMMIT TER TERRACE TRCE TRACE TRAK TRACK TRFY TRAFFICWAY TRL TRAIL TRLR TRAILER TUNL TUNNEL TPKE TURNPIKE UN UNION UPPR UPPER VLY VALLEY VIA VIADUCT VW VIEW VLG VILLAGE VL VILLE VIS VISTA WAY WAY WL WELL WLS WELLS ;;;; ;run;quit; proc format cntlin=adr.&pgm._abv2stdes lib=adr.adr_formats_v1; run;quit; data adr.&pgm._nrmst; input; _infile_=upcase(_infile_); fro=scan(_infile_,1,' '); too=scan(_infile_,2,' '); if substr(strip(fro),-1,1)='S' and substr(strip(too),-1,1)='S' then too=substr(too,1,length(too)-1); cards4; ALLEY ALY ANEX ANX ANNEX ANX ARCADE ARC BEACH BCH BEND BND BLUF BLF BOTTOM BTM BRANCH BR BRIDGE BRG BRNCH BR BROOK BRK BROOKS BRKS BURG BG BURGS BGS BYPA BYP BYPAS BYP BYPASS BYP BYPS BYP CAMP CP CANYN CYN CANYON CYN CAPE CPE CAUSEWAY CSWY CAUSWAY CSWY CEN CTR CENT CTR CENTER CTR CENTERS CTRS CENTR CTR CENTRE CTR CIRC CIR CIRCL CIR CIRCLE CIR CIRCLES CIRS CK CRK CLIFF CLF CLIFFS CLFS CLUB CLB CMP CP CNTER CTR CNTR CTR CNYN CYN COMMON CMN CORNER COR CORNERS CORS COURSE CRSE COURT CT COURTS CTS COVE CV COVES CVS CR CRK CRCL CIR CRCLE CIR CRECENT CRES CREEK CRK CRESCENT CRES CRESENT CRES CREST CRST CROSSING XING CROSSROAD XRD CRSCNT CRES CRSENT CRES CRSNT CRES CRSSING XING CRSSNG XING CRT CT CURVE CURV DALE DL DAM DM DIV DV DIVIDE DV DRIV DR DRIVE DR DRIVES DRS DRV DR DVD DV ESTATE EST ESTATES ESTS EXP EXPY EXPR EXPY EXPRESS EXPY EXPRESSWAY EXPY EXPW EXPY EXTENSION EXT EXTENSIONS EXTS EXTN EXT EXTNSN EXT FALLS FLS FERRY FRY FIELD FLD FIELDS FLDS FLAT FLT FLATS FLTS FORD FRD FORDS FRDS FOREST FRST FORESTS FRST FORG FRG FORGE FRG FORGES FRGS FORK FRK FORKS FRKS FORT FT FREEWAY FWY FREEWY FWY FRRY FRY FRT FT FRWAY FWY FRWY FWY GARDEN GDN GARDENS GDNS GARDN GDN GATEWAY GTWY GATEWY GTWY GATWAY GTWY GLEN GLN GLENS GLNS GRDEN GDN GRDN GDN GRDNS GDNS GREEN GRN GREENS GRNS GROV GRV GROVE GRV GROVES GRVS GTWAY GTWY HARB HBR HARBOR HBR HARBORS HBRS HARBR HBR HAVEN HVN HAVN HVN HEIGHT HTS HEIGHTS HTS HGTS HTS HIGHWAY HWY HIGHWY HWY HILL HL HILLS HLS HIWAY HWY HIWY HWY HLLW HOLW HOLLOW HOLW HOLLOWS HOLW HOLWS HOLW HRBOR HBR HT HTS HWAY HWY INLET INLT ISLAND IS ISLANDS ISS ISLES ISLE ISLND IS ISLNDS ISS JCTION JCT JCTN JCT JCTNS JCTS JUNCTION JCT JUNCTIONS JCTS JUNCTN JCT JUNCTON JCT KEY KY KEYS KYS KNOL KNL KNOLL KNL KNOLLS KNLS LA LN LAKE LK LAKES LKS LANDING LNDG LANE LN LANES LN LDGE LDG LIGHT LGT LIGHTS LGTS LNDNG LNDG LOAF LF LOCK LCK LOCKS LCKS LODG LDG LODGE LDG LOOPS LOOP MANOR MNR MANORS MNRS MEADOW MDW MEADOWS MDWS MEDOWS MDWS MILL ML MILLS MLS MISSION MSN MISSN MSN MNT MT MNTAIN MTN MNTN MTN MNTNS MTNS MOTORWAY MTWY MOUNT MT MOUNTAIN MTN MOUNTAINS MTNS MOUNTIN MTN MSSN MSN MTIN MTN NECK NCK ORCHARD ORCH ORCHRD ORCH OVERPASS OPAS OVL OVAL PARKS PARK PARKWAY PKWY PARKWAYS PKWY PARKWY PKWY PASSAGE PSGE PATHS PATH PIKES PIKE PINE PNE PINES PNES PK PARK PKWAY PKWY PKWYS PKWY PKY PKWY PLACE PL PLAIN PLN PLAINES PLNS PLAINS PLNS PLAZA PLZ PLZA PLZ POINT PT POINTS PTS PORT PRT PORTS PRTS PRAIRIE PR PRARIE PR PRK PARK PRR PR RAD RADL RADIAL RADL RADIEL RADL RANCH RNCH RANCHES RNCH RAPID RPD RAPIDS RPDS RDGE RDG REST RST RIDGE RDG RIDGES RDGS RIVER RIV RIVR RIV RNCHS RNCH ROAD RD ROADS RDS ROUTE RTE RVR RIV SHOAL SHL SHOALS SHLS SHOAR SHR SHOARS SHRS SHORE SHR SHORES SHRS SKYWAY SKWY SPNG SPG SPNGS SPGS SPRING SPG SPRINGS SPGS SPRNG SPG SPRNGS SPGS SPURS SPUR SQR SQ SQRE SQ SQRS SQS SQU SQ SQUARE SQ SQUARES SQS STATION STA STATN STA STN STA STR ST STRAV STRA STRAVE STRA STRAVEN STRA STRAVENUE STRA STRAVN STRA STREAM STRM STREET ST STREETS STS STREME STRM STRT ST STRVN STRA STRVNUE STRA SUMIT SMT SUMITT SMT SUMMIT SMT TERR TER TERRACE TER THROUGHWAY TRWY TPK TPKE TR TRL TRACE TRCE TRACES TRCE TRACK TRAK TRACKS TRAK TRAFFICWAY TRFY TRAIL TRL TRAILS TRL TRK TRAK TRKS TRAK TRLS TRL TRNPK TPKE TRPK TPKE TUNEL TUNL TUNLS TUNL TUNNEL TUNL TUNNELS TUNL TUNNL TUNL TURNPIKE TPKE TURNPK TPKE UNDERPASS UPAS UNION UN UNIONS UNS VALLEY VLY VALLEYS VLYS VALLY VLY VDCT VIA VIADCT VIA VIADUCT VIA VIEW VW VIEWS VWS VILL VLG VILLAG VLG VILLAGE VLG VILLAGES VLGS VILLE VL VILLG VLG VILLIAGE VLG VIST VIS VISTA VIS VLLY VLY VST VIS VSTA VIS WALKS WALK WELL WL WELLS WLS WY WAY ;;;; ;run;quit; data adr.&pgm._fmtmkesfx; retain fmtname '$adr_st2sfx'; set adr.&pgm._nrmst; start=fro; end=start; label=too; ;run;quit; proc format cntlin=adr.&pgm._fmtmkesfx lib=adr.adr_formats_v1; ;run;quit; proc format cntlin=adr.&pgm._unitabv lib=adr.adr_formats_v1; ;run;quit; proc format cntlin=adr.&pgm._abv2stdes lib=adr.adr_formats_v1; ;run;quit; proc catalog cat=adr.adr_formats_v1; modify ADR_ABV2STDES.formatc(description="Reverse street short form to long form TPKE->Turnpike"); modify ADR_UNIT2ABV.formatc(description="Occupancy Type ie APARTMENT -> APT"); modify ADR_ST2SFX.formatc(description="Street suffix ie TURNPIKE -> TPKE"); ;run;quit; proc catalog cat=adr.adr_formats_v1; contents; ;run;quit;
Contributor
Posts: 23

Re: Extracting Street, House Number and Door Number

Posted in reply to rogerjdeangelis

Thank you for the quick reply. I will try to do it that way.

 

In the meantime, would you please have a look at the following 2 posts, which are connected to the same topic:

https://communities.sas.com/t5/SAS-Data-Management/Software-Generates-Different-Matching-Codes-For-t...

and 

https://communities.sas.com/t5/SAS-Data-Management/Matching-Wrong-Rows-From-Two-Tables/m-p/278204

 

Looking forward to your reply.

 

Regards!

Ask a Question
Discussion stats
  • 6 replies
  • 630 views
  • 0 likes
  • 3 in conversation