As ballardw suggested, you will need to re-use some matches. There will likely not be enough choices to make each match unique. Here's another approach. The set-up takes most of the work, creating formats to identify which observations might match. Choosing one randomly then is relatively easy. Here's one way to start: proc sort data=dataset2; by industry year; run; data dataset2a; set dataset2; retain fmtname '$indyr'; by industry year; length label $ 11; if first.year then label = put (_n_, z5.); retain label; if last.year; start = industry || put(year, 4.); substr(label, 7, 5) = put(_n_, z5.); output; run; proc format cntlin=dataset2a; run; This will give you a format that translates the concatenated Industry + year into the set of observations that match. For example, when LABEL=1001 1420, it means that the observations that match the given industry + year are observations 1001 through 1420. You might want to print a few observations from dataset2a to get a feel for what that looks like. Using the format is (comparatively) easy. For example: data want; set dataset1; observations = put(industry || put(year, 4.), $indyr.); firstone = input(scan(observations, 1), 5.); lastone = input(scan(observations, 2, 5.); do pairing=1 to 500; obsno = firstone + floor(ranuni(12345) * (lastone - firstone + 1)); set dataset1 point=obsno; output; end; run; The observations in dataset1 do not have to be sorted for this to work. If mismatches are possible (some Industry/year combinations in dataset1 but not in dataset2), this can still be done but becomes a little harder. The code is untested, so may need to be debugged, but the approach should be fine. All the observations end up in a single data set, which you can sort and process BY PAIRING if necessary. Good luck. ... edited to add the second SET statement
... View more