DATA Step, Macro, Functions and more

Messy data

Reply
Occasional Contributor
Posts: 16

Messy data

Hi

I have a messy data which need to be organized. The original file has 500 variables, but for simplicity I came up with a sample excel sheet.

The data has two columns school and contact.  The contact information needs to be organized into separate columns (name address phone fax web). The final output should have the following columns

     school  name  address  phone  fax    web

I appreciate your suggestions and help.

SchoolContact
CASTILLEJA SCHOOLNancy Ware  1310 Bryant Street  Palo Alto California  94301 Phone: (508) 626-1567  Fax: (508) 872-0260 http://www.aaspjhasjearch.net
CHALLENGER SCHOOLWiedmann 1185 Hollenback Avenue  Sunnyvale California 94086 PhoneSmiley Sad954) 689-3984  FaxSmiley Sad954) 689-3985 http://www.akhsksajtech.com
CENTRAL MIDDLE SCHOOLJan Dresser 828 Chestnut Street San Carlos California 94070http://www.aprjaurpimkesolution.com
Cesar Chavez Academy2450 Ralmar Avenue East Palo Alto California  94303
CLIFFORD SCHOOLhttp://www.cliffodf0galschool.com
Costano SchoolCharles Salter
CRITTENDEN MIDDLE SCHOOLC. Anzia, Robin Russell, Joyce Murphree and Barbara Saxton phone: (123) 456-7890
Contributor
Posts: 65

Re: Messy data

Hi Thummala,

Since the data is in excel Sheet you can directly import it, using proc import.

In the Second variable has to be devided in to various of iuts sub values as Name, and address, phone etc. but there are no delimiters as mentioned in the sample data. My sugesstions are in steps asd below:

>   Using index & substr

               #Get the 'Index value' for numbers(0123456789), if its not '1' then there is some value before the door number that would be probably the name. you can substr it by using the length as 'index value' -1.

               #Index for 'Phone' & 'Fax' by this you can take the values of address Phone # and Fax # accordingly.

               #If there is any othere value remaining in the variable that would be the web page

If the index value goes '0' then the indexing value should be moved to the next indexing value and should be carried on.

This is just a idea and is not fool proof though... you can just try with this as a begining.

Regards,

Moovendhan D

Ask a Question
Discussion stats
  • 1 reply
  • 176 views
  • 0 likes
  • 2 in conversation