I am trying to generate a 1% random sample from my data set, but I only want 1 of each distinct value in a variable. For example, if I have a set that looks like:
Company Name Project Number Joint Company
CompA 553 CompX
CompA 552 CompY
CompB 133 CompZ
In my random sample, I would only want CompA chosen once, not multiple times due to being in multiple rows. So, I need a 1% random sample using variable CompName. In addition, I do not want it to produce a new data set. Instead, I want a new variable created (let's say, "Random"), where a value of 1 means that row was selected for the random sample, and '.' or '0' means it was not selected.
Does that make sense? I have tried both proc surveyselect and proc sql and cannot 1) get distinct values instead of duplicates and 2) generate a new column in the same data set (or even a new data set that has all the original variables in it).
Thank you for your help!
Add a random number to your data.
Sort data by company, random number
Create a random Bernoulli variable that has a 1% probability at the start of each company.
I'm not sure what you mean by adding a random number to my data?
Combine cluster sampling with stratified sampling. For example, to get a sample of 10 car models, but never more than one per make:
/* Select 10 makes */
proc surveyselect data=sashelp.cars sampsize=10 outall
out=makeSample(rename=selected=selectedMake);
samplingunit make;
run;
/* Select one model from every make */
proc surveyselect data=makeSample sampsize=1 outall
out=modelSample(rename=selected=selectedModel);
strata make;
run;
/* Select the selected model for every selected make */
data sample;
set modelSample;
selected = selectedMake and selectedModel;
run;
Example assumes that the data is sorted by make.
Follow what @Reeza said.
data have;
set sashelp.zipcode;
call streaminit(12345678);
r=rand('uniform');
run;
proc sort data=have ;
by STATE r;
run;
data want;
set have;
by STATE;
if first.STATE then Random=rand('bern',0.1) ;
drop r;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.