From your description, it sounds like you might have 1,000 cases and 4,000 controls as rows in one dataset that has 5,000 rows. Is that right?
Could you show a sample of of your input data (just ~10 records, with ID, Case, Case_ID, CAA) and the desired output from that sample?
If I'm understanding what you want, I would do it in three steps:
1. Select the cases of interest (those with CAA="No Involvement") and output them to a dataset.
2. Select the controls for those cases (you could do this by joining/merging the cases of interest to the original data, matching the ID for the cases to the Case_ID of the controls.
3. Stack together the selected cases and controls.
It could probably be done with one step, but for such small data, I'd probably do it in multiple steps. So if there are 111 cases with CAA="No Involvement" the first step would output 111 records. The second step would output 444 records. And stacking them together would give 555 records.
... View more