I need somebody who can help me clean my "Contracts" table.
My approach can be seen in the data job (attachment)
The biggest probem is with extracting the correct HOUSE_NUMBER and DOOR_NUMBER fields (which are "melted" into STREET field)
For example Housenumber can have values such as:
Out of 150.000 distinct contracts around 50.000 have unmatchable House Numbers.
How can I correctly parse the fields to extract:
???
Without know what each of the possible combinations are and what they mean, I am not sure you can. You could drop all non-numeric data: compress(<var>,' ','kd')
But this probably wont help, I mean what is: 21/121/4 and how does it compare to Top 21 3 G for instance, there doesn't seem to be anything logical in that data at all.
Suppose you have a single address string
HAVE
ENROLLMENT GROUP 39 E COCONUT ST 3RD FL
WANT
[
(u'ENROLLMENT', 'Recipient'),
(u'GROUP', 'Recipient'),
(u'39', 'AddressNumber'),
(u'E', 'StreetNamePreDirectional'),
(u'COCONUT', 'StreetName'),
(u'ST', 'StreetNamePostType'),
(u'3RD', 'OccupancyIdentifier'),
(u'FL\n', 'OccupancyType')
]
* the Python USADDRESS package can do much more than
this. But lets see if it can parse the 6 permutations
of the same address below. I have already used PERL
to do some translations , ie East to E, Street to ST, THIRD to 3rd
floor to FL.
* create input for Python;
data adr_badadr;
file "d:/txt/KeyBadAdr.txt";
length keybadadr $100.;
input;
key=put(10000000000+_n_,z11.);
KeyBadAdr=cats(key,_infile_);
putlog KeyBadAdr;
put KeyBadAdr;
cards4;
ENROLLMENT GROUP 39 E COCONUT ST 3RD FL
ENROLLMENT GROUP 3RD FL 39 E COCONUT ST
39 E COCONUT ST ENROLLMENT GROUP 3RD FL
39 E COCONUT ST 3RD FL ENROLLMENT GROUP
3RD FL ENROLLMENT GROUP 39 E COCONUT ST
3RD FL 39 E COCONUT ST ENROLLMENT GROUP
;;;;
;run;quit;
/* in the log
10000000001ENROLLMENT GROUP 39 E COCONUT ST 3RD FL
10000000002ENROLLMENT GROUP 3RD FL 39 E COCONUT ST
1000000000339 E COCONUT ST ENROLLMENT GROUP 3RD FL
1000000000439 E COCONUT ST 3RD FL ENROLLMENT GROUP
100000000053RD FL ENROLLMENT GROUP 39 E COCONUT ST
100000000063RD FL 39 E COCONUT ST ENROLLMENT GROUP
*/
* It looks like you will have to do a lot of manual
cleaning before you can use open source packages
to clean stadardize your addresses.
* the most critical address parts for a strong matchcode are
street number
street name
Note it is not really possible to know if
'Enrollment' is a recipient or placename
Group could be a state or locality like UTAH,
depending on its position.
Also it is not always possible to get the
occupancy parts. It depends on the location of the occupancy parts.
You can really mess up a mailing with assumptions
on non street parts.
I am going to build a matchcode based on
First character of first name, last name, street number, street name
and zipcode.
Python USADDRESS does an excellent job of parsing the
the key elements for a match code, ie Street Number and Street Name.
It also handles components like PO BOX.
* here is the simple python code that takes your input
and creates the output above.
%utl_submit_py64(%nrbquote(
fo = open('d:/txt/KeyBadAdrFix.txt', 'w');
import usaddress;
for line in open('d:/txt/KeyBadAdr.txt'):;
. addr=line[11:];
. key=line[0:11];
. res=key + str(usaddress.parse(addr)) + '\n';
. lyn=str(addr);
. fo.write(str(res));
));
* good way to look the python output, you need full SAS for this.
proc fslist file="d:/txt/KeyBadAdrFix.txt";
run;quit;
* here is the parsing of the first address by Python
10000000001[
(u'ENROLLMENT', 'Recipient'),
(u'GROUP', 'Recipient'),
(u'39', 'AddressNumber'),
(u'E', 'StreetNamePreDirectional'),
(u'COCONUT', 'StreetName'),
(u'ST', 'StreetNamePostType'),
(u'3RD', 'OccupancyIdentifier'),
(u'FL\n', 'OccupancyType')]
Here is the last record
10000000006[
(u'3RD', 'OccupancyIdentifier'),
(u'FL', 'OccupancyType'),
(u'39', 'AddressNumber'),
(u'E', 'StreetNamePreDirectional'),
(u'COCONUT', 'StreetName'),
(u'ST', 'StreetNamePostType'),
(u'ENROLLMENT', 'PlaceName'),
(u'GROUP\n', 'StateName')
* lets see if all of these addressses have the same
street number
street name
* create name value pairs for further processing;
data KeyBadAdrNrm;
retain seq 0 adrrec name value;
keep seq adrrec name value;
length txt $100 key $11 str $700;
infile "d:/txt/KeyBadAdrFix.txt" lrecl=1034;
input;
adrrec=substr(_infile_,1,11);
parens=countc(_infile_,')');
str=substr(_infile_,12);
do i=1 to parens;
txt=substr(scan(str,i,')'),3);
txt=compress(txt,"'[()'");
txt=substr(prxchange('s/\\n//',2,txt),2);
seq=seq+1;
value=scan(txt,1,',');
name=scan(txt,2,',');
output;
end;
run;quit;
p to 40 obs WORK.KEYBADADRNRM total obs=48
ENROLLMENT GROUP 39 E COCONUT ST 3RD FL
bs SEQ ADRREC NAME VALUE
1 1 10000000001 Recipient ENROLLMENT
2 2 10000000001 Recipient GROUP
3 3 10000000001 AddressNumber 39
4 4 10000000001 StreetNamePreDirectional E
5 5 10000000001 StreetName COCONUT
6 6 10000000001 StreetNamePostType ST
7 7 10000000001 OccupancyIdentifier 3RD
8 8 10000000001 OccupancyType FL
9 9 10000000002 Recipient ENROLLMENT
10 10 10000000002 Recipient GROUP
11 11 10000000002 Recipient 3RD
12 12 10000000002 Recipient FL
13 13 10000000002 AddressNumber 39
14 14 10000000002 StreetNamePreDirectional E
15 15 10000000002 StreetName COCONUT
16 16 10000000002 StreetNamePostType ST
If you check you will see all addresses had street number and street name parsed correctly.
Other elements depend on position in string, no way around this.
* you need to upcase, remove punctuation and
convert to standard USPS abreviations.
Perl has some free translations.
CAUSEWAY CSWY
CAUSWAY CSWY
CASWAY CSWY
CEN CTR
CENT CTR
CENTER CTR
CENTERS CTRS
CENTR CTR
CENTRE CTR
CIRC CIR
CIRCL CIR
CIRCLE CIR
CIRCLES CIRS
;
* invoking python from SAS;
%macro utl_submit_py64(pgm)/des="Semi colon separated set of py commands";
* write the program to a temporary file;
filename py_pgm "%sysfunc(pathname(work))/py_pgm.py" lrecl=32766 recfm=v;
data _null_;
length pgm $32755 cmd $255;
file py_pgm ;
pgm="&pgm";
semi=countc(pgm,';');
do idx=1 to semi;
cmd=cats(scan(pgm,idx,';'));
if cmd=:'.' then cmd=substr(cmd,2);
put cmd $char96.;
putlog cmd $char96.;
end;
run;
%let _loc=%sysfunc(pathname(py_pgm));
%put &_loc;
filename rut pipe "C:\Python_27_64bit/python.exe &_loc";
data _null_;
file print;
infile rut;
input;
put _infile_;
run;
filename rut clear;
filename py_pgm clear;
%mend utl_submit_py64;
Hello Roger,
First, allow me to say thank you for your reply and the time invested in writing the code.
Secondy, I am completely new to SAS, without background in Perl or Python. Threfore, I would really apprecite if you could send me detailed description of how to integrate your code in SAS Data Job (I suppose, that is what needs to be done). Additionally, I would like to mention that I already have QKB, the German version.
Regards!
Suppose you have a single address string
HAVE
ENROLLMENT GROUP 39 E COCONUT ST 3RD FL
WANT
[
(u'ENROLLMENT', 'Recipient'),
(u'GROUP', 'Recipient'),
(u'39', 'AddressNumber'),
(u'E', 'StreetNamePreDirectional'),
(u'COCONUT', 'StreetName'),
(u'ST', 'StreetNamePostType'),
(u'3RD', 'OccupancyIdentifier'),
(u'FL\n', 'OccupancyType')
]
* the Python USADDRESS package can do much more than
this. But lets see if it can parse the 6 permutations
of the same address below. I have already used PERL
to do some translations , ie East to E, Street to ST, THIRD to 3rd
floor to FL.
* create input for Python;
data adr_badadr;
file "d:/txt/KeyBadAdr.txt";
length keybadadr $100.;
input;
key=put(10000000000+_n_,z11.);
KeyBadAdr=cats(key,_infile_);
putlog KeyBadAdr;
put KeyBadAdr;
cards4;
ENROLLMENT GROUP 39 E COCONUT ST 3RD FL
ENROLLMENT GROUP 3RD FL 39 E COCONUT ST
39 E COCONUT ST ENROLLMENT GROUP 3RD FL
39 E COCONUT ST 3RD FL ENROLLMENT GROUP
3RD FL ENROLLMENT GROUP 39 E COCONUT ST
3RD FL 39 E COCONUT ST ENROLLMENT GROUP
;;;;
;run;quit;
/* in the log
10000000001ENROLLMENT GROUP 39 E COCONUT ST 3RD FL
10000000002ENROLLMENT GROUP 3RD FL 39 E COCONUT ST
1000000000339 E COCONUT ST ENROLLMENT GROUP 3RD FL
1000000000439 E COCONUT ST 3RD FL ENROLLMENT GROUP
100000000053RD FL ENROLLMENT GROUP 39 E COCONUT ST
100000000063RD FL 39 E COCONUT ST ENROLLMENT GROUP
*/
* It looks like you will have to do a lot of manual
cleaning before you can use open source packages
to clean stadardize your addresses.
* the most critical address parts for a strong matchcode are
street number
street name
Note it is not really possible to know if
'Enrollment' is a recipient or placename
Group could be a state or locality like UTAH,
depending on its position.
Also it is not always possible to get the
occupancy parts. It depends on the location of the occupancy parts.
You can really mess up a mailing with assumptions
on non street parts.
I am going to build a matchcode based on
First character of first name, last name, street number, street name
and zipcode.
Python USADDRESS does an excellent job of parsing the
the key elements for a match code, ie Street Number and Street Name.
It also handles components like PO BOX.
* here is the simple python code that takes your input
and creates the output above.
%utl_submit_py64(%nrbquote(
fo = open('d:/txt/KeyBadAdrFix.txt', 'w');
import usaddress;
for line in open('d:/txt/KeyBadAdr.txt'):;
. addr=line[11:];
. key=line[0:11];
. res=key + str(usaddress.parse(addr)) + '\n';
. lyn=str(addr);
. fo.write(str(res));
));
* good way to look the python output, you need full SAS for this.
proc fslist file="d:/txt/KeyBadAdrFix.txt";
run;quit;
* here is the parsing of the first address by Python
10000000001[
(u'ENROLLMENT', 'Recipient'),
(u'GROUP', 'Recipient'),
(u'39', 'AddressNumber'),
(u'E', 'StreetNamePreDirectional'),
(u'COCONUT', 'StreetName'),
(u'ST', 'StreetNamePostType'),
(u'3RD', 'OccupancyIdentifier'),
(u'FL\n', 'OccupancyType')]
Here is the last record
10000000006[
(u'3RD', 'OccupancyIdentifier'),
(u'FL', 'OccupancyType'),
(u'39', 'AddressNumber'),
(u'E', 'StreetNamePreDirectional'),
(u'COCONUT', 'StreetName'),
(u'ST', 'StreetNamePostType'),
(u'ENROLLMENT', 'PlaceName'),
(u'GROUP\n', 'StateName')
* lets see if all of these addressses have the same
street number
street name
* create name value pairs for further processing;
data KeyBadAdrNrm;
retain seq 0 adrrec name value;
keep seq adrrec name value;
length txt $100 key $11 str $700;
infile "d:/txt/KeyBadAdrFix.txt" lrecl=1034;
input;
adrrec=substr(_infile_,1,11);
parens=countc(_infile_,')');
str=substr(_infile_,12);
do i=1 to parens;
txt=substr(scan(str,i,')'),3);
txt=compress(txt,"'[()'");
txt=substr(prxchange('s/\\n//',2,txt),2);
seq=seq+1;
value=scan(txt,1,',');
name=scan(txt,2,',');
output;
end;
run;quit;
p to 40 obs WORK.KEYBADADRNRM total obs=48
ENROLLMENT GROUP 39 E COCONUT ST 3RD FL
bs SEQ ADRREC NAME VALUE
1 1 10000000001 Recipient ENROLLMENT
2 2 10000000001 Recipient GROUP
3 3 10000000001 AddressNumber 39
4 4 10000000001 StreetNamePreDirectional E
5 5 10000000001 StreetName COCONUT
6 6 10000000001 StreetNamePostType ST
7 7 10000000001 OccupancyIdentifier 3RD
8 8 10000000001 OccupancyType FL
9 9 10000000002 Recipient ENROLLMENT
10 10 10000000002 Recipient GROUP
11 11 10000000002 Recipient 3RD
12 12 10000000002 Recipient FL
13 13 10000000002 AddressNumber 39
14 14 10000000002 StreetNamePreDirectional E
15 15 10000000002 StreetName COCONUT
16 16 10000000002 StreetNamePostType ST
If you check you will see all addresses had street number and street name parsed correctly.
Other elements depend on position in string, no way around this.
* you need to upcase, remove punctuation and
convert to standard USPS abreviations.
Perl has some free translations.
CAUSEWAY CSWY
CAUSWAY CSWY
CASWAY CSWY
CEN CTR
CENT CTR
CENTER CTR
CENTERS CTRS
CENTR CTR
CENTRE CTR
CIRC CIR
CIRCL CIR
CIRCLE CIR
CIRCLES CIRS
;
* invoking python from SAS;
%macro utl_submit_py64(pgm)/des="Semi colon separated set of py commands";
* write the program to a temporary file;
filename py_pgm "%sysfunc(pathname(work))/py_pgm.py" lrecl=32766 recfm=v;
data _null_;
length pgm $32755 cmd $255;
file py_pgm ;
pgm="&pgm";
semi=countc(pgm,';');
do idx=1 to semi;
cmd=cats(scan(pgm,idx,';'));
if cmd=:'.' then cmd=substr(cmd,2);
put cmd $char96.;
putlog cmd $char96.;
end;
run;
%let _loc=%sysfunc(pathname(py_pgm));
%put &_loc;
filename rut pipe "C:\Python_27_64bit/python.exe &_loc";
data _null_;
file print;
infile rut;
input;
put _infile_;
run;
filename rut clear;
filename py_pgm clear;
%mend utl_submit_py64;
This code should help you with initial cleanup.
Just put the location of pythin in the utl_submit_p64
get python
https://www.python.org/downloads/
You do not need to know much Python to exceute the USADDRESS package.
There are excellent eduacational materials on the web.
libname adr "d:/adr";
options fmtsearch=(adr.adr_formats_v1 work.formats);
*************************************************************************;
data adr.&pgm._unitabv;
retain fmtname "$adr_unit2abv";
informat start $12. label $5.;
input start label;
end=start;
cards4;
APARTMENT APT
BASEMENT BSMT
BUILDING BLDG
DEPARTMENT DEPT
FLOOR FL
FRONT FRNT
HANGAR HNGR
LOBBY LBBY
LOT LOT
LOWER LOWR
OFFICE OFC
PENTHOUSE PH
PIER PIER
REAR REAR
ROOM RM
SIDE SIDE
SLIP SLIP
SPACE SPC
STOP STOP
SUITE STE
TRAILER TRLR
UNIT UNIT
UPPER UPPR
;;;;
;run;quit;
data adr.&pgm._abv2stdes;
retain fmtname "$adr_abv2stdes";
informat start $7. label $16.;
input start label;
end=start;
cards4;
ALY ALLEY
ANX ANNEX
APT APARTMENT
ARC ARCADE
AVE AVENUE
BSMT BASEMENT
BYU BAYOU
BCH BEACH
BND BEND
BLF BLUFF
BTM BOTTOM
BLVD BOULEVARD
BR BRANCH
BRG BRIDGE
BRK BROOK
BLDG BUILDING
BG BURG
BYP BYPASS
CP CAMP
CYN CANYON
CPE CAPE
CSWY CAUSEWAY
CTR CENTER
CIR CIRCLE
CLF CLIFF
CLFS CLIFFS
CLB CLUB
COR CORNER
CORS CORNERS
CRSE COURSE
CT COURT
CTS COURTS
CV COVE
CRK CREEK
CRES CRESCENT
XING CROSSING
DL DALE
DM DAM
DEPT DEPARTMENT
DV DIVIDE
DR DRIVE
EST ESTATE
EXPY EXPRESSWAY
EXT EXTENSION
FLS FALLS
FRY FERRY
FLD FIELD
FLDS FIELDS
FLT FLAT
FL FLOOR
FRD FORD
FRST FOREST
FRG FORGE
FRK FORK
FRKS FORKS
FT FORT
FWY FREEWAY
FRNT FRONT
GDN GARDEN
GDNS GARDENS
GTWY GATEWAY
GLN GLEN
GRN GREEN
GRV GROVE
HNGR HANGER
HBR HARBOR
HVN HAVEN
HTS HEIGHTS
HWY HIGHWAY
HL HILL
HLS HILLS
HOLW HOLLOW
INLT INLET
IS ISLAND
ISS ISLANDS
JCT JUNCTION
KY KEY
KNL KNOLL
KNLS KNOLLS
LK LAKE
LKS LAKES
LNDG LANDING
LN LANE
LGT LIGHT
LF LOAF
LBBY LOBBY
LCK LOCK
LCKS LOCKS
LDG LODGE
LOWR LOWER
MNR MANOR
MDW MEADOW
MDWS MEADOWS
ML MILL
MLS MILLS
MSN MISSION
MT MOUNT
MTN MOUNTAIN
NCK NECK
OFC OFFICE
ORCH ORCHARD
PKWY PARKWAY
PH PENTHOUSE
PNE PINE
PNES PINES
PL PLACE
PLN PLAIN
PLNS PLAINS
PLZ PLAZA
PT POINT
PRT PORT
PR PRAIRIE
RADL RADIAL
RNCH RANCH
RPD RAPID
RPDS RAPIDS
RST REST
RDG RIDGE
RIV RIVER
RD ROAD
RM ROOM
SHL SHOAL
SHLS SHOALS
SHR SHORE
SHRS SHORES
SPC SPACE
SPG SPRING
SPGS SPRINGS
SQ SQUARE
STA STATION
STRA STRAVENUE
STRM STREAM
ST STREET
STE SUITE
SMT SUMMIT
TER TERRACE
TRCE TRACE
TRAK TRACK
TRFY TRAFFICWAY
TRL TRAIL
TRLR TRAILER
TUNL TUNNEL
TPKE TURNPIKE
UN UNION
UPPR UPPER
VLY VALLEY
VIA VIADUCT
VW VIEW
VLG VILLAGE
VL VILLE
VIS VISTA
WAY WAY
WL WELL
WLS WELLS
;;;;
;run;quit;
proc format cntlin=adr.&pgm._abv2stdes lib=adr.adr_formats_v1;
run;quit;
data adr.&pgm._nrmst;
input;
_infile_=upcase(_infile_);
fro=scan(_infile_,1,' ');
too=scan(_infile_,2,' ');
if substr(strip(fro),-1,1)='S' and
substr(strip(too),-1,1)='S' then too=substr(too,1,length(too)-1);
cards4;
ALLEY ALY
ANEX ANX
ANNEX ANX
ARCADE ARC
BEACH BCH
BEND BND
BLUF BLF
BOTTOM BTM
BRANCH BR
BRIDGE BRG
BRNCH BR
BROOK BRK
BROOKS BRKS
BURG BG
BURGS BGS
BYPA BYP
BYPAS BYP
BYPASS BYP
BYPS BYP
CAMP CP
CANYN CYN
CANYON CYN
CAPE CPE
CAUSEWAY CSWY
CAUSWAY CSWY
CEN CTR
CENT CTR
CENTER CTR
CENTERS CTRS
CENTR CTR
CENTRE CTR
CIRC CIR
CIRCL CIR
CIRCLE CIR
CIRCLES CIRS
CK CRK
CLIFF CLF
CLIFFS CLFS
CLUB CLB
CMP CP
CNTER CTR
CNTR CTR
CNYN CYN
COMMON CMN
CORNER COR
CORNERS CORS
COURSE CRSE
COURT CT
COURTS CTS
COVE CV
COVES CVS
CR CRK
CRCL CIR
CRCLE CIR
CRECENT CRES
CREEK CRK
CRESCENT CRES
CRESENT CRES
CREST CRST
CROSSING XING
CROSSROAD XRD
CRSCNT CRES
CRSENT CRES
CRSNT CRES
CRSSING XING
CRSSNG XING
CRT CT
CURVE CURV
DALE DL
DAM DM
DIV DV
DIVIDE DV
DRIV DR
DRIVE DR
DRIVES DRS
DRV DR
DVD DV
ESTATE EST
ESTATES ESTS
EXP EXPY
EXPR EXPY
EXPRESS EXPY
EXPRESSWAY EXPY
EXPW EXPY
EXTENSION EXT
EXTENSIONS EXTS
EXTN EXT
EXTNSN EXT
FALLS FLS
FERRY FRY
FIELD FLD
FIELDS FLDS
FLAT FLT
FLATS FLTS
FORD FRD
FORDS FRDS
FOREST FRST
FORESTS FRST
FORG FRG
FORGE FRG
FORGES FRGS
FORK FRK
FORKS FRKS
FORT FT
FREEWAY FWY
FREEWY FWY
FRRY FRY
FRT FT
FRWAY FWY
FRWY FWY
GARDEN GDN
GARDENS GDNS
GARDN GDN
GATEWAY GTWY
GATEWY GTWY
GATWAY GTWY
GLEN GLN
GLENS GLNS
GRDEN GDN
GRDN GDN
GRDNS GDNS
GREEN GRN
GREENS GRNS
GROV GRV
GROVE GRV
GROVES GRVS
GTWAY GTWY
HARB HBR
HARBOR HBR
HARBORS HBRS
HARBR HBR
HAVEN HVN
HAVN HVN
HEIGHT HTS
HEIGHTS HTS
HGTS HTS
HIGHWAY HWY
HIGHWY HWY
HILL HL
HILLS HLS
HIWAY HWY
HIWY HWY
HLLW HOLW
HOLLOW HOLW
HOLLOWS HOLW
HOLWS HOLW
HRBOR HBR
HT HTS
HWAY HWY
INLET INLT
ISLAND IS
ISLANDS ISS
ISLES ISLE
ISLND IS
ISLNDS ISS
JCTION JCT
JCTN JCT
JCTNS JCTS
JUNCTION JCT
JUNCTIONS JCTS
JUNCTN JCT
JUNCTON JCT
KEY KY
KEYS KYS
KNOL KNL
KNOLL KNL
KNOLLS KNLS
LA LN
LAKE LK
LAKES LKS
LANDING LNDG
LANE LN
LANES LN
LDGE LDG
LIGHT LGT
LIGHTS LGTS
LNDNG LNDG
LOAF LF
LOCK LCK
LOCKS LCKS
LODG LDG
LODGE LDG
LOOPS LOOP
MANOR MNR
MANORS MNRS
MEADOW MDW
MEADOWS MDWS
MEDOWS MDWS
MILL ML
MILLS MLS
MISSION MSN
MISSN MSN
MNT MT
MNTAIN MTN
MNTN MTN
MNTNS MTNS
MOTORWAY MTWY
MOUNT MT
MOUNTAIN MTN
MOUNTAINS MTNS
MOUNTIN MTN
MSSN MSN
MTIN MTN
NECK NCK
ORCHARD ORCH
ORCHRD ORCH
OVERPASS OPAS
OVL OVAL
PARKS PARK
PARKWAY PKWY
PARKWAYS PKWY
PARKWY PKWY
PASSAGE PSGE
PATHS PATH
PIKES PIKE
PINE PNE
PINES PNES
PK PARK
PKWAY PKWY
PKWYS PKWY
PKY PKWY
PLACE PL
PLAIN PLN
PLAINES PLNS
PLAINS PLNS
PLAZA PLZ
PLZA PLZ
POINT PT
POINTS PTS
PORT PRT
PORTS PRTS
PRAIRIE PR
PRARIE PR
PRK PARK
PRR PR
RAD RADL
RADIAL RADL
RADIEL RADL
RANCH RNCH
RANCHES RNCH
RAPID RPD
RAPIDS RPDS
RDGE RDG
REST RST
RIDGE RDG
RIDGES RDGS
RIVER RIV
RIVR RIV
RNCHS RNCH
ROAD RD
ROADS RDS
ROUTE RTE
RVR RIV
SHOAL SHL
SHOALS SHLS
SHOAR SHR
SHOARS SHRS
SHORE SHR
SHORES SHRS
SKYWAY SKWY
SPNG SPG
SPNGS SPGS
SPRING SPG
SPRINGS SPGS
SPRNG SPG
SPRNGS SPGS
SPURS SPUR
SQR SQ
SQRE SQ
SQRS SQS
SQU SQ
SQUARE SQ
SQUARES SQS
STATION STA
STATN STA
STN STA
STR ST
STRAV STRA
STRAVE STRA
STRAVEN STRA
STRAVENUE STRA
STRAVN STRA
STREAM STRM
STREET ST
STREETS STS
STREME STRM
STRT ST
STRVN STRA
STRVNUE STRA
SUMIT SMT
SUMITT SMT
SUMMIT SMT
TERR TER
TERRACE TER
THROUGHWAY TRWY
TPK TPKE
TR TRL
TRACE TRCE
TRACES TRCE
TRACK TRAK
TRACKS TRAK
TRAFFICWAY TRFY
TRAIL TRL
TRAILS TRL
TRK TRAK
TRKS TRAK
TRLS TRL
TRNPK TPKE
TRPK TPKE
TUNEL TUNL
TUNLS TUNL
TUNNEL TUNL
TUNNELS TUNL
TUNNL TUNL
TURNPIKE TPKE
TURNPK TPKE
UNDERPASS UPAS
UNION UN
UNIONS UNS
VALLEY VLY
VALLEYS VLYS
VALLY VLY
VDCT VIA
VIADCT VIA
VIADUCT VIA
VIEW VW
VIEWS VWS
VILL VLG
VILLAG VLG
VILLAGE VLG
VILLAGES VLGS
VILLE VL
VILLG VLG
VILLIAGE VLG
VIST VIS
VISTA VIS
VLLY VLY
VST VIS
VSTA VIS
WALKS WALK
WELL WL
WELLS WLS
WY WAY
;;;;
;run;quit;
data adr.&pgm._fmtmkesfx;
retain fmtname '$adr_st2sfx';
set adr.&pgm._nrmst;
start=fro;
end=start;
label=too;
;run;quit;
proc format cntlin=adr.&pgm._fmtmkesfx lib=adr.adr_formats_v1;
;run;quit;
proc format cntlin=adr.&pgm._unitabv lib=adr.adr_formats_v1;
;run;quit;
proc format cntlin=adr.&pgm._abv2stdes lib=adr.adr_formats_v1;
;run;quit;
proc catalog cat=adr.adr_formats_v1;
modify ADR_ABV2STDES.formatc(description="Reverse street short form to long form TPKE->Turnpike");
modify ADR_UNIT2ABV.formatc(description="Occupancy Type ie APARTMENT -> APT");
modify ADR_ST2SFX.formatc(description="Street suffix ie TURNPIKE -> TPKE");
;run;quit;
proc catalog cat=adr.adr_formats_v1;
contents;
;run;quit;
Thank you for the quick reply. I will try to do it that way.
In the meantime, would you please have a look at the following 2 posts, which are connected to the same topic:
and
https://communities.sas.com/t5/SAS-Data-Management/Matching-Wrong-Rows-From-Two-Tables/m-p/278204
Looking forward to your reply.
Regards!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.