Solved: Re: Split single column to mutiple coulums

souji · Posted 07-15-2020 01:05 PM

Hi,

Can you please help me to split single column to multiple column, here below city1 has full address with city,state and zip code. I need to split it city1 column into city , state, Zip1, zip2 columns

city1

BRONX NY 10467-6021

LAS VEGAS NV 89119

BATTLE MOUNTAIN NV 89820-1998

NORTH LAS VEGAS NV 89031-3432

Thanks

Soujanya

Jagadishkatam · Posted 07-15-2020 03:27 PM

Could you please mark the response as answered if you agree you got the correct answer. This helps others searching for a similar issue.

Thanks,
Jag

View solution in original post

PaigeMiller · Posted 07-15-2020 01:07 PM

Can we assume that your data always has a 2 digit state? Or can sometimes NY be spelled as N.Y. or New York? Are the locations always in the United States, or can the data contain locations in Canada or Uruguay?

Will there always be a five or 9 digit zip code? Does the 9 digit zip code always have a hyphen?

--
Paige Miller

souji · Posted 07-15-2020 01:17 PM

Thanks for quick response ,

State always has a 2 digit state and in the United States only

zip code has 9 digits with hyphen ( Ex: BRONX NY 10467-6021 , BATTLE MOUNTAIN NV 89820-1998)

And some rows has 5 digit zip code only ( Ex: LAS VEGAS NV 89119), no hyphen and pin code

PaigeMiller · Posted 07-15-2020 01:29 PM

Okay, here's a start. You can determine where the two character state value ends and the zip code begins. Example:

data a;
	string='BATTLE MOUNTAIN NV  89820-1998';
	str1=anyalpha(string,-30);
run;

the -30 indicates you go right to left, starting at position 30 (this is the max length of the string), and you get STR1 = 18, meaning NV ends in position 18 of the string. ANything to the right of position 18 is the zip. Position 17-18 is the state, and anything in positions 1 through 16 is the city.

--
Paige Miller

Jagadishkatam · Posted 07-15-2020 01:29 PM

Hi Soujanya,

Please let me know if this helps.

data have;
input city1 :&$200.;
cards;
BRONX NY 10467-6021
LAS VEGAS NV 89119
BATTLE MOUNTAIN NV 89820-1998
NORTH LAS VEGAS NV 89031-3432
;


data want;
set have;
city=prxchange('s/(.*)(\s\w{2,2})(\s\d.*)/$1/oi',-1,city1);
state=prxchange('s/(.*)(\s\w{2,2})(\s\d.*)/$2/oi',-1,city1);
zip1=scan(prxchange('s/(.*)(\s\w{2,2})(\s\d.*)/$3/oi',-1,city1),1,'-');
zip2=scan(prxchange('s/(.*)(\s\w{2,2})(\s\d.*)/$3/oi',-1,city1),2,'-');
run;

Thanks,
Jag

souji · Posted 07-15-2020 02:30 PM

Thank You Jagadish, Can you please explain or please refer any document that explain in detail

it was worked and trying to learn

soujanya

Jagadishkatam · Posted 07-15-2020 03:11 PM

You may refer to my paper on perl regular expressions

https://www.phusewiki.org/docs/2019%20Amsterdam/Papers_presentations/CT/CT%20Final%20Papers/CT12.pdf

Thanks,
Jag

Jagadishkatam · Posted 07-15-2020 03:27 PM

Could you please mark the response as answered if you agree you got the correct answer. This helps others searching for a similar issue.

Thanks,
Jag

PaigeMiller · Posted 07-15-2020 03:51 PM

Here's my full code

data have;
input city1 :&$200.;
cards;
BRONX NY 10467-6021
LAS VEGAS NV 89119
BATTLE MOUNTAIN NV 89820-1998
NORTH LAS VEGAS NV 89031-3432
;
data want;
	set have;
	length zip $ 10 city $ 40 state $ 2;
	pos=anyalpha(city1,-200);
	zip=substr(city1,pos+1);
	state=substr(city1,pos-1,2);
	city=substr(city1,1,pos-2);
	drop pos;
run;

--
Paige Miller

souji · Posted 07-15-2020 05:50 PM

Thank You

souji · Posted 07-17-2020 02:31 PM

Hi Miller,

Can you please make me understand below, why you add pos+1 for zip variable(+1) and pos-1 for state (-1),pos-2 for city(-2)

pos=anyalpha(city1,-200);

zip=substr(city1,pos+1);

state=substr(city1,pos-1,2);
city=substr(city1,1,pos-2);

Thank you very much

Registration is open

SAS Training: Just a Click Away