Hi All,
I have a data set which has addresses that are all messy, it looks line for example -
2303 215 Queen St E
123 21 Apple Rd
02 1 Yonge St E
64 Ardwick Blvd
16 Kelman Crt
I want the result as follows -
I want to exclude any observation that has multiple spaces inform of an alphabet, that is I only want 64 Ardwick Blvd and 16 Kelman Crt in my dataset and exclude everything else.
Also id there are two spaces in front of the alphabet then I need to keep this just in case if there is any observation like -
64 Ardwick Blvd (i.e. 2 spaces after 64) or 64 Ardwick Blvd (i.e. 3 spaces after 64) then I need to include these as 64 Ardwick Blvd.
Any help is much appreciated...
Thanks!
data new; infile datalines; input id address $20.; cards; 2303 215 Queen St E 123 21 Apple Rd 02 1 Yonge St E 64 Ardwick Blvd 16 Kelman Crt ; run; data new; set new; X=scan(address, 3); if X ne'' then delete; run;
Hopefully, this will solve your problem.
Like this?
data HAVE;
input ADDRESS $20.;
cards;
2303 215 Queen St E
123 21 Apple Rd
02 1 Yonge St E
64 Ardwick Blvd
16 Kelman Crt
run;
data WANT;
set HAVE;
ADRESS=compbl(ADDRESS);
if prxmatch('/[^a-zA-Z\s]+?\s[^a-zA-Z\s]+?\s\w/',ADDRESS) then delete;
run;
NOTE: There were 5 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 2 observations and 3 variables.
If I understood what you want.
data HAVE;
input ADDRESS $20.;
cards;
2303 215 Queen St E
123 21 Apple Rd
02 1 Yonge St E
64 Ardwick Blvd
16 Kelman Crt
run;
data WANT;
set HAVE;
ADRESS=compbl(ADDRESS);
if countc(strip(substr(ADDRESS,findc(ADDRESS,' '))),' ')=1;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.