I am trying to generate a 1% random sample from my data set, but I only want 1 of each distinct value in a variable. For example, if I have a set that looks like:
Company Name Project Number Joint Company
CompA 553 CompX
CompA 552 CompY
CompB 133 CompZ
In my random sample, I would only want CompA chosen once, not multiple times due to being in multiple rows. So, I need a 1% random sample using variable CompName. In addition, I do not want it to produce a new data set. Instead, I want a new variable created (let's say, "Random"), where a value of 1 means that row was selected for the random sample, and '.' or '0' means it was not selected.
Does that make sense? I have tried both proc surveyselect and proc sql and cannot 1) get distinct values instead of duplicates and 2) generate a new column in the same data set (or even a new data set that has all the original variables in it).
Thank you for your help!
Add a random number to your data.
Sort data by company, random number
Create a random Bernoulli variable that has a 1% probability at the start of each company.
I'm not sure what you mean by adding a random number to my data?
Combine cluster sampling with stratified sampling. For example, to get a sample of 10 car models, but never more than one per make:
/* Select 10 makes */
proc surveyselect data=sashelp.cars sampsize=10 outall
out=makeSample(rename=selected=selectedMake);
samplingunit make;
run;
/* Select one model from every make */
proc surveyselect data=makeSample sampsize=1 outall
out=modelSample(rename=selected=selectedModel);
strata make;
run;
/* Select the selected model for every selected make */
data sample;
set modelSample;
selected = selectedMake and selectedModel;
run;
Example assumes that the data is sorted by make.
Follow what @Reeza said.
data have;
set sashelp.zipcode;
call streaminit(12345678);
r=rand('uniform');
run;
proc sort data=have ;
by STATE r;
run;
data want;
set have;
by STATE;
if first.STATE then Random=rand('bern',0.1) ;
drop r;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.