Determining which values of variable are formatted differently

Reply
Regular Contributor
Posts: 153

Determining which values of variable are formatted differently

Hello,

 

I have a large dataset that was populated with multiple sources of data.  I have a percentage variable formatted differently due to the way they were formatted in the original datasets.  I would like an easy way to compare the different values of a variable to see which of them I need to reformat.  For example, in the original data source, I may have a value which is a percentage that  has a value of 4.25% formatted as .0425 or incorrectly at .00425.  That is just one example.  I need to compare and possibly be able to see all of the differences in format of that variable.

 

Is there an easy way to do this?  

 

Thanks!

Super User
Posts: 21,546

Re: Determining which values of variable are formatted differently

Posted in reply to elwayfan446

If you can specify rules, yes. If not, then no. 

 

How do you know if it's really 0.045 or should be 0.0045?

Regular Contributor
Posts: 153

Re: Determining which values of variable are formatted differently

Because I created the data in the multiple original sources. When I converted it from excel, I may have divided by a bigger factor of the data than I needed too. Some original data may have not even needed to be divided at all but I didn't catch this until all of the data was merged together in a new dataset.
Super User
Posts: 21,546

Re: Determining which values of variable are formatted differently

Posted in reply to elwayfan446

I would: 

 

1. Redo my merge/append and make sure to identify teh source files. It's likely the files are all the same, e.g if a variable is messed up for FileA, it's messed up for all the records in FileA for a particular variable. If you appended, you can use the INDSNAME option. 

 

2. Eyeball the data or do a histogram and isolate values with the file source. You'll likely be able to pick out the issues. 

 

Unless there's a rule that you can define, such as if percent is < 0.001 then it's wrong then I'm not sure how you would identify those records.

 

 

Regular Contributor
Posts: 153

Re: Determining which values of variable are formatted differently

Great, let me look into that and I will let you know. Thanks for the quick reply.
Super User
Posts: 21,546

Re: Determining which values of variable are formatted differently

Posted in reply to elwayfan446

It's probably obvious that reimporting/fixing data at the source is the ideal solution, which should be relatively easy since it's a program. Then you just re-run the remaining portion of your programs.

Regular Contributor
Posts: 153

Re: Determining which values of variable are formatted differently

Yes, that would definitely be easiest.  It is a lot of code to sift through and over 50 datasets would have to be appended back together for the new dataset.  I was trying to avoid that by doing a quick update but that doesn't look likely at this point.

Ask a Question
Discussion stats
  • 6 replies
  • 80 views
  • 0 likes
  • 2 in conversation