Hi,
Can you please help me to split single column to multiple column, here below city1 has full address with city,state and zip code. I need to split it city1 column into city , state, Zip1, zip2 columns
city1
BRONX NY 10467-6021
LAS VEGAS NV 89119
BATTLE MOUNTAIN NV 89820-1998
NORTH LAS VEGAS NV 89031-3432
Thanks
Soujanya
Can we assume that your data always has a 2 digit state? Or can sometimes NY be spelled as N.Y. or New York? Are the locations always in the United States, or can the data contain locations in Canada or Uruguay?
Will there always be a five or 9 digit zip code? Does the 9 digit zip code always have a hyphen?
Thanks for quick response ,
State always has a 2 digit state and in the United States only
zip code has 9 digits with hyphen ( Ex: BRONX NY 10467-6021 , BATTLE MOUNTAIN NV 89820-1998)
And some rows has 5 digit zip code only ( Ex: LAS VEGAS NV 89119), no hyphen and pin code
Okay, here's a start. You can determine where the two character state value ends and the zip code begins. Example:
data a;
string='BATTLE MOUNTAIN NV 89820-1998';
str1=anyalpha(string,-30);
run;
the -30 indicates you go right to left, starting at position 30 (this is the max length of the string), and you get STR1 = 18, meaning NV ends in position 18 of the string. ANything to the right of position 18 is the zip. Position 17-18 is the state, and anything in positions 1 through 16 is the city.
Hi Soujanya,
Please let me know if this helps.
data have;
input city1 :&$200.;
cards;
BRONX NY 10467-6021
LAS VEGAS NV 89119
BATTLE MOUNTAIN NV 89820-1998
NORTH LAS VEGAS NV 89031-3432
;
data want;
set have;
city=prxchange('s/(.*)(\s\w{2,2})(\s\d.*)/$1/oi',-1,city1);
state=prxchange('s/(.*)(\s\w{2,2})(\s\d.*)/$2/oi',-1,city1);
zip1=scan(prxchange('s/(.*)(\s\w{2,2})(\s\d.*)/$3/oi',-1,city1),1,'-');
zip2=scan(prxchange('s/(.*)(\s\w{2,2})(\s\d.*)/$3/oi',-1,city1),2,'-');
run;
Thank You Jagadish, Can you please explain or please refer any document that explain in detail
it was worked and trying to learn
soujanya
You may refer to my paper on perl regular expressions
https://www.phusewiki.org/docs/2019%20Amsterdam/Papers_presentations/CT/CT%20Final%20Papers/CT12.pdf
Here's my full code
data have;
input city1 :&$200.;
cards;
BRONX NY 10467-6021
LAS VEGAS NV 89119
BATTLE MOUNTAIN NV 89820-1998
NORTH LAS VEGAS NV 89031-3432
;
data want;
set have;
length zip $ 10 city $ 40 state $ 2;
pos=anyalpha(city1,-200);
zip=substr(city1,pos+1);
state=substr(city1,pos-1,2);
city=substr(city1,1,pos-2);
drop pos;
run;
Hi Miller,
Can you please make me understand below, why you add pos+1 for zip variable(+1) and pos-1 for state (-1),pos-2 for city(-2)
pos=anyalpha(city1,-200);
zip=substr(city1,pos+1);
state=substr(city1,pos-1,2);
city=substr(city1,1,pos-2);
Thank you very much
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.