I'm working with a dataset of lab values that occurred within 7 days of each other. Essentially, I'm trying to figure out what values are "duplicates" that couldn't be removed with a proc sort because the lab value and date are close but not the same (unfortunately a result of some reporting issues of date and value). I'm having issues because there can be multiple occurrences by id. For example, id 1 in the simulated dataset below has 3 labs within 7 days of 5/30/19 and also 2 labs within 7 days of 7/7/19. obs id date value psource 1 1 5/30/19 20 1 2 1 5/31/19 20 0 3 1 6/1/19 19 0 4 1 7/7/19 350 1 5 1 7/10/19 200 1 6 2 1/3/19 100 0 7 2 1/4/19 99 1 8 2 3/6/19 19 1 9 2 3/8/19 20 0 10 3 12/8/19 1500 1 11 3 12/10/19 1500 1 12 3 12/11/19 1500 0 13 3 1/15/20 19 1 14 3 1/16/20 20 0 As a result, I would like to create a count column that looks like the one below. obs id date value psource count 1 1 5/30/19 20 1 1 2 1 5/31/19 20 0 2 3 1 6/1/19 19 0 3 4 1 7/7/19 350 1 1 5 1 7/10/19 200 1 2 6 2 1/3/19 100 0 1 7 2 1/4/19 99 1 2 8 2 3/6/19 19 1 1 9 2 3/8/19 20 0 2 10 3 12/8/19 1500 1 1 11 3 12/10/19 1500 1 2 12 3 12/11/19 1500 0 3 13 3 1/15/20 19 1 1 14 3 1/16/20 20 0 2 Lastly, I would like to apply some rules like if the labs occurred within 7 days of each other for a given id and are close in value (within 11), then extract only the first one that comes from the preferred source (psource=1). For example, keep row 10 and get rid of rows 11 and 12. Also, if the labs occurred within 7 days of each other and aren't close in value then create a dataset of those so that those can undergo further review. For example, output rows 4 and 5. Thank you in advance for any help on this.
... View more