I am supposed to aggregate this into an episode level file that allows for a maximum of three days between from and through date, while retaining all the other information in the claims, but only if dates overlap in claims. So, for example, all of the claims for id=1 would be collapsed into a single episode of care, because all the dates fall into the range 1-4. But the claims for id=4 would generate 3 episodes, because the dates between services are distinct. And I have to save all the information from all the claims regardless of how many distinct episodes are created - original from and thru dates, cpt codes, and severity levels.
I see 2 issues here:
1. How to identify episodes.
2. How to organise the data.
Question to 1:
id fromdate todate
1 1 4
1 3 6
1 5 8
How many episodes would that be and which record would belong to which episode?
Thoughts about 2:
Your dataset can either be aggregated or on atomic level. To have both you can:
A: have 2 datasets; generate a primary key for the aggregated dataset and add this key to the atomic dataset (the one you have already) as foreign key.
B: create only one aggregated dataset and store all atomic information de-normalised into "arrays".
C: keep the atomic dataset (number of rows) but add additional variables with aggregated values which are repeated (also "de-normalised").
I would go for variant A or C.
Variant B will give you a lot of headache (and cause some macro coding).
The number of variables to store the atomic information (=size of the array) would vary (maximum number of rows which make one episode).
Let me know what you're heading for and answer question 1 please.
Just make sure to create test data to cover all possible cases - and then define the rules to build episodes before you even start with coding.
Atomic is a term used in data organisation (not SAS specific).
Out of a theory book: "Granularity refers to the level of detail represented by the values stored in a table's row. Data stored at their lowest level of granularity are said to be atomic data."
Let us know the rules when you've got them and provide also the test data.
Thanks for your suggestions. They helped - I used one of them to guide me somewhat -
1 - Defined claims as "singles" "possible parents" and "children" - the first two at an episode level, the last at claim level - had to do "possible" parents because I could not use information from the previous observation to determine if they had a "child"
2 - Reorganized "children" into an episode level file
3 - Merged "possible parents" back onto "children" - if they found a match they were confirmed as "parents" - otherwise "singles"
4 - Put singles, parents & children together.