Hoping to receive help with the following problem; data set immediately below followed by problem.
The actual data set is much more complex, but here's simplified data:
data test;
input id year date period span;
datalines;
1 2000 9 1 40
1 2000 9 2 375
1 2000 10 1 10
1 2000 10 2 355
1 2001 1 1 -15
1 2001 1 2 290
1 2001 9 1 39
1 2001 9 2 320
2 2000 11 1 43
2 2000 11 2 350
2 2000 12 1 25
2 2000 12 2 310
2 2000 1 1 -40
2 2000 1 2 280
;
run;
For each id-year group (the above data set is spaced based upon the id-year groupings), I need to keep a single id-year-date group (i.e., a set of period 1-2 pairs) where the observation from period 1 (of the period 1-2 pair) has a value of span which meets two criterion: 1) span is greater than 0, and 2) the value of span is closest to 0 of all other period 1 observations for the id-year group. Note, I do not want the period 2 observation that meets those 2 criterion, rather I want to keep period 2 based upon its period 1 counterpart that best meets those criterion.
Therefore, the final data set should look like this:
1 2000 9 1 40 (eliminated because 40 is farther away from 0 than 10)
1 2000 9 2 375 (eliminated because its period 1 pair was eliminated due to its paired period 1's span being at a greater distance)
1 2000 10 1 10
1 2000 10 2 355
1 2001 1 1 -15 (eliminated because the value of span is negative: -15)
1 2001 1 2 290 (eliminated because its period 1 pair was eliminated due to a negative span)
1 2001 9 1 39
1 2001 9 2 320
2 2000 11 1 43 (eliminated because 43 is farther away from 0 than 25)
2 2000 11 2 350 (eliminated because its period 1 pair was eliminated due to its paired period 1's span being at a greater distance)
2 2000 12 1 25
2 2000 12 2 310
2 2000 1 1 -40 (eliminated because the value of span was negative: -40)
2 2000 1 2 280 (eliminated because its period 1 pair was eliminated due to a negative span)
The actual dataset has more than just periods 1 and 2 and can contain any number of period groupings per id-year.
An acceptable solution could possibly copy the value of span from period 1 of each id-year-date group to the other periods within the group and then do a two-step process of first, eliminating all observations with span less than zero, and then second, sorting on id-year-period-span in descending order and then retain only the first observation of each id-year-period group...?
... View more