I am trying to generate a 1% random sample from my data set, but I only want 1 of each distinct value in a variable. For example, if I have a set that looks like:


Company Name        Project Number        Joint Company

CompA                       553                          CompX

CompA                       552                          CompY

CompB                       133                          CompZ


In my random sample, I would only want CompA chosen once, not multiple times due to being in multiple rows. So, I need a 1% random sample using variable CompName. In addition, I do not want it to produce a new data set. Instead, I want a new variable created (let's say, "Random"), where a value of 1 means that row was selected for the random sample, and '.' or '0' means it was not selected.


Does that make sense? I have tried both proc surveyselect and proc sql and cannot 1) get distinct values instead of duplicates and 2) generate a new column in the same data set (or even a new data set that has all the original variables in it). 


Thank you for your help!

Add a random number to your data. 

Sort data by company, random number

Create a random Bernoulli variable that has a 1% probability at the start of each company. 




I'm not sure what you mean by adding a random number to my data?

Combine cluster sampling with stratified sampling. For example, to get a sample of 10 car models, but never more than one per make:


/* Select 10 makes */
proc surveyselect sampsize=10 outall
samplingunit make;

/* Select one model from every make */
proc surveyselect data=makeSample sampsize=1 outall 
strata make;

/* Select the selected model for every selected make */
data sample;
set modelSample;
selected = selectedMake and selectedModel;

Example assumes that the data is sorted by make.


Follow what @Reeza said.


data have;
 set sashelp.zipcode;
 call streaminit(12345678);

proc sort data=have ;
by STATE r;

data want;
 set have;
 by STATE;
 if first.STATE then Random=rand('bern',0.1) ;
 drop r;

