02-19-2013 04:18 AM
I have a messy data which need to be organized. The original file has 500 variables, but for simplicity I came up with a sample excel sheet.
The data has two columns school and contact. The contact information needs to be organized into separate columns (name address phone fax web). The final output should have the following columns
school name address phone fax web
I appreciate your suggestions and help.
|CASTILLEJA SCHOOL||Nancy Ware 1310 Bryant Street Palo Alto California 94301 Phone: (508) 626-1567 Fax: (508) 872-0260 http://www.aaspjhasjearch.net|
|CHALLENGER SCHOOL||Wiedmann 1185 Hollenback Avenue Sunnyvale California 94086 Phone954) 689-3984 Fax954) 689-3985 http://www.akhsksajtech.com|
|CENTRAL MIDDLE SCHOOL||Jan Dresser 828 Chestnut Street San Carlos California 94070http://www.aprjaurpimkesolution.com|
|Cesar Chavez Academy||2450 Ralmar Avenue East Palo Alto California 94303|
|Costano School||Charles Salter|
|CRITTENDEN MIDDLE SCHOOL||C. Anzia, Robin Russell, Joyce Murphree and Barbara Saxton phone: (123) 456-7890|
02-19-2013 04:56 AM
Since the data is in excel Sheet you can directly import it, using proc import.
In the Second variable has to be devided in to various of iuts sub values as Name, and address, phone etc. but there are no delimiters as mentioned in the sample data. My sugesstions are in steps asd below:
> Using index & substr
#Get the 'Index value' for numbers(0123456789), if its not '1' then there is some value before the door number that would be probably the name. you can substr it by using the length as 'index value' -1.
#Index for 'Phone' & 'Fax' by this you can take the values of address Phone # and Fax # accordingly.
#If there is any othere value remaining in the variable that would be the web page
If the index value goes '0' then the indexing value should be moved to the next indexing value and should be carried on.
This is just a idea and is not fool proof though... you can just try with this as a begining.