SAS Programming

elwayfan446 · Posted 12-07-2017 04:24 PM

Hello,

I have a large dataset that was populated with multiple sources of data. I have a percentage variable formatted differently due to the way they were formatted in the original datasets. I would like an easy way to compare the different values of a variable to see which of them I need to reformat. For example, in the original data source, I may have a value which is a percentage that has a value of 4.25% formatted as .0425 or incorrectly at .00425. That is just one example. I need to compare and possibly be able to see all of the differences in format of that variable.

Is there an easy way to do this?

Thanks!

Reeza · Posted 12-07-2017 04:26 PM

If you can specify rules, yes. If not, then no.

How do you know if it's really 0.045 or should be 0.0045?

elwayfan446 · Posted 12-07-2017 04:28 PM

Because I created the data in the multiple original sources. When I converted it from excel, I may have divided by a bigger factor of the data than I needed too. Some original data may have not even needed to be divided at all but I didn't catch this until all of the data was merged together in a new dataset.

Reeza · Posted 12-07-2017 04:34 PM

I would:

1. Redo my merge/append and make sure to identify teh source files. It's likely the files are all the same, e.g if a variable is messed up for FileA, it's messed up for all the records in FileA for a particular variable. If you appended, you can use the INDSNAME option.

2. Eyeball the data or do a histogram and isolate values with the file source. You'll likely be able to pick out the issues.

Unless there's a rule that you can define, such as if percent is < 0.001 then it's wrong then I'm not sure how you would identify those records.

elwayfan446 · Posted 12-07-2017 04:39 PM

Great, let me look into that and I will let you know. Thanks for the quick reply.

Reeza · Posted 12-07-2017 04:41 PM

It's probably obvious that reimporting/fixing data at the source is the ideal solution, which should be relatively easy since it's a program. Then you just re-run the remaining portion of your programs.

elwayfan446 · Posted 12-07-2017 04:45 PM

Yes, that would definitely be easiest. It is a lot of code to sift through and over 50 datasets would have to be appended back together for the new dataset. I was trying to avoid that by doing a quick update but that doesn't look likely at this point.

SAS Programming

Determining which values of variable are formatted differently

Re: Determining which values of variable are formatted differently

Re: Determining which values of variable are formatted differently

Re: Determining which values of variable are formatted differently

Re: Determining which values of variable are formatted differently

Re: Determining which values of variable are formatted differently

Re: Determining which values of variable are formatted differently

difference in difference with control variables

Same IF statement across different variables

Change Multiple variables with different values

PROC Gnosis - Determination of Mortality Attributed From Diabetes

Determining variable counts

Follow Us

What is...

SAS Programming

Register Today!

SAS Training: Just a Click Away

Follow Us

What is...