Solved
Contributor
Posts: 48

# Claims data: clean data first, or query out patients first?

Hi all,

I've been given an algorithm to select patients who have a certain type of illness. Given a set of ICD-9 codes + inclusion procedure codes, + other criteria (age, region, etc).

Generally with claims data (this is Truven) -- should I clean the entire set first, and then isolate my sample, or isolate my sample and then clean?

Thanks,

Accepted Solutions
Solution
‎12-09-2017 01:31 AM
Super User
Posts: 13,583

## Re: Claims data: clean data first, or query out patients first?

I agree in general with @Reeza but experience has taught me if age is involved to always at least check it early in any process where it is important.

Finding data like date of birth after the date a service is performed or age (not to mention gender) inappropriate services might be a concern.

You may also have to consider age at time of service vs age at data extract depending on your data systems. Many systems will maintain demographics such as birth date separately from services and may calculate an age based on the date of the extract for each record even though the services were on different dates.

All Replies
Super User
Posts: 23,776

## Re: Claims data: clean data first, or query out patients first?

cdubs wrote:

Hi all,

I've been given an algorithm to select patients who have a certain type of illness. Given a set of ICD-9 codes + inclusion procedure codes, + other criteria (age, region, etc).

Generally with claims data (this is Truven) -- should I clean the entire set first, and then isolate my sample, or isolate my sample and then clean?

Thanks,

Depends on your cleaning process. If the cleaning process can affect selection then it needs to go first.

Solution
‎12-09-2017 01:31 AM
Super User
Posts: 13,583

## Re: Claims data: clean data first, or query out patients first?

I agree in general with @Reeza but experience has taught me if age is involved to always at least check it early in any process where it is important.

Finding data like date of birth after the date a service is performed or age (not to mention gender) inappropriate services might be a concern.

You may also have to consider age at time of service vs age at data extract depending on your data systems. Many systems will maintain demographics such as birth date separately from services and may calculate an age based on the date of the extract for each record even though the services were on different dates.

☑ This topic is solved.