Hi All,
I have a data set which has addresses that are all messy, it looks line for example -
2303 215 Queen St E
123 21 Apple Rd
02 1 Yonge St E
64 Ardwick Blvd
16 Kelman Crt
I want the result as follows -
I want to exclude any observation that has multiple spaces inform of an alphabet, that is I only want 64 Ardwick Blvd and 16 Kelman Crt in my dataset and exclude everything else.
Also id there are two spaces in front of the alphabet then I need to keep this just in case if there is any observation like -
64 Ardwick Blvd (i.e. 2 spaces after 64) or 64 Ardwick Blvd (i.e. 3 spaces after 64) then I need to include these as 64 Ardwick Blvd.
Any help is much appreciated...
Thanks!
data new; infile datalines; input id address $20.; cards; 2303 215 Queen St E 123 21 Apple Rd 02 1 Yonge St E 64 Ardwick Blvd 16 Kelman Crt ; run; data new; set new; X=scan(address, 3); if X ne'' then delete; run;
Hopefully, this will solve your problem.
Like this?
data HAVE;
input ADDRESS $20.;
cards;
2303 215 Queen St E
123 21 Apple Rd
02 1 Yonge St E
64 Ardwick Blvd
16 Kelman Crt
run;
data WANT;
set HAVE;
ADRESS=compbl(ADDRESS);
if prxmatch('/[^a-zA-Z\s]+?\s[^a-zA-Z\s]+?\s\w/',ADDRESS) then delete;
run;
NOTE: There were 5 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 2 observations and 3 variables.
If I understood what you want.
data HAVE;
input ADDRESS $20.;
cards;
2303 215 Queen St E
123 21 Apple Rd
02 1 Yonge St E
64 Ardwick Blvd
16 Kelman Crt
run;
data WANT;
set HAVE;
ADRESS=compbl(ADDRESS);
if countc(strip(substr(ADDRESS,findc(ADDRESS,' '))),' ')=1;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.