I just want to point out that the question says
> The sample should include all available distinct values from three columns
The PROC SURVEYSELECT code that you marked as "correct" does not necessarily "include all available distinct values." It simply extracts 500 random observations.
In general, it might be impossible to satisfy that constraint. For example, if X1 = _N_, then there are 1 million distinct values and no subset of 500 observations can include all distinct values. If you want to include all distinct values, you would have to sort the data, then use the FIRST.VAR technique to extract the distinct combinations:
proc sort data=have;
by x1 - x3;
run;
data distinct;
set have;
by x1 - x3;
if first.x1 | first.x2 | first.x3;
run;
This method is unlikely to create 500 observations.
... View more