How to format/clean data?

Reply
Occasional Contributor
Posts: 17

How to format/clean data?

Attached is a sample data of what I am trying to do. The top part is the raw data and the bottom part is how I want my data to look. So basically when the ID# is the same keep the descriptive information on the first row (like amt, price1, price2, maturity, etc...) but delete the repetitive info when there are different transactions recorded for the same ID# on different dates - this would make my sheet more clean....

Let me know if you have problems opening the attachment - I can try to copy the data here.

Thanks for your help!

Super User
Posts: 19,770

Re: How to format/clean data?

No problems opening the data, but it doesn't make sense, you have 3 tables side by side, no "top or bottom". I suggest, posting the data here instead, please make sure the sample input and output correspond.

Occasional Contributor
Posts: 17

Re: How to format/clean data?

hmm this isn't how the data should look like in the file....

Here is the raw data..

Raw Data.png

And here is how I want it to look like...

Final Look.png

Occasional Contributor
Posts: 17

Re: How to format/clean data?

I cut few rows from the final look table to save space but I hope this makes it clear what I want to do.

Thanks!

Super Contributor
Posts: 644

Re: How to format/clean data?

I don't understand why the first record in your 'want' table for id 11111111 picks up date and cat data for id 33333333 a couple of rows down.  Assuming these are mistakes it seems you want to suppress key information from the first few columns where the data is repeated in succeeding rows.  This would be a bad idea for storing data for future analysis but might make sense if what you want to do next is put the 'want' data into a report.

Assuming your data is sorted by maturity amt the following data step should achieve the result:

options missing = ' ' ;  /* use blank instead of . for displaying missing numerical data */

/* check syntax */

data want ;

     set have ;

     by maturity amt ;

     if not first. maturity then maturity = . ;

     if not first.amt then amt = . ;

run ;

Note that data is no longer sorted because values of key variables have been suppressed.

Richard

Occasional Contributor
Posts: 17

Re: How to format/clean data?

Posted in reply to RichardinOz

the  id 11111111 doesn't really picks up date and cat data for id 33333333 a couple of rows down - if it is different id then date and cat can be the same and sometimes its not the same. I will run the code you suggested and will let you know if it works.

Thanks!

Ask a Question
Discussion stats
  • 5 replies
  • 327 views
  • 0 likes
  • 3 in conversation