@Reeza wrote:
Use PROC FREQ to get the percentages of each variable and then pick the top one for each category. This handles ties by taking the random values. In my experience, it's better to take the latest value instead of ties, especially for somethings like Gender where it has the potential to change.
Yes gender may change. In my case the data sometimes has this happen on the same day and in an environment where it is clearly not a "before"/"after" case. Two or more swabs taken from different specimen sites on the same day.
There I times I start suspecting either some sort of fraud or just plain $%^ on someone's part entering or selecting a "unique identifier". Same date of test(s), same unique patient ID, same clinic: different gender, date of birth, race and/or ethnicity.
When asked for any report I do tend to standardize to the latest recorded values for the id when the individual is a matter of interest such as with repeated infections.
... View more