- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a large dataset that was populated with multiple sources of data. I have a percentage variable formatted differently due to the way they were formatted in the original datasets. I would like an easy way to compare the different values of a variable to see which of them I need to reformat. For example, in the original data source, I may have a value which is a percentage that has a value of 4.25% formatted as .0425 or incorrectly at .00425. That is just one example. I need to compare and possibly be able to see all of the differences in format of that variable.
Is there an easy way to do this?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you can specify rules, yes. If not, then no.
How do you know if it's really 0.045 or should be 0.0045?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would:
1. Redo my merge/append and make sure to identify teh source files. It's likely the files are all the same, e.g if a variable is messed up for FileA, it's messed up for all the records in FileA for a particular variable. If you appended, you can use the INDSNAME option.
2. Eyeball the data or do a histogram and isolate values with the file source. You'll likely be able to pick out the issues.
Unless there's a rule that you can define, such as if percent is < 0.001 then it's wrong then I'm not sure how you would identify those records.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's probably obvious that reimporting/fixing data at the source is the ideal solution, which should be relatively easy since it's a program. Then you just re-run the remaining portion of your programs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, that would definitely be easiest. It is a lot of code to sift through and over 50 datasets would have to be appended back together for the new dataset. I was trying to avoid that by doing a quick update but that doesn't look likely at this point.