BookmarkSubscribeRSS Feed
slobber
Calcite | Level 5

I am trying to generate a 1% random sample from my data set, but I only want 1 of each distinct value in a variable. For example, if I have a set that looks like:

 

Company Name        Project Number        Joint Company

CompA                       553                          CompX

CompA                       552                          CompY

CompB                       133                          CompZ

 

In my random sample, I would only want CompA chosen once, not multiple times due to being in multiple rows. So, I need a 1% random sample using variable CompName. In addition, I do not want it to produce a new data set. Instead, I want a new variable created (let's say, "Random"), where a value of 1 means that row was selected for the random sample, and '.' or '0' means it was not selected.

 

Does that make sense? I have tried both proc surveyselect and proc sql and cannot 1) get distinct values instead of duplicates and 2) generate a new column in the same data set (or even a new data set that has all the original variables in it). 

 

Thank you for your help!

4 REPLIES 4
Reeza
Super User

Add a random number to your data. 

Sort data by company, random number

Create a random Bernoulli variable that has a 1% probability at the start of each company. 

 

 

 

slobber
Calcite | Level 5

I'm not sure what you mean by adding a random number to my data?

PGStats
Opal | Level 21

Combine cluster sampling with stratified sampling. For example, to get a sample of 10 car models, but never more than one per make:

 

/* Select 10 makes */
proc surveyselect data=sashelp.cars sampsize=10 outall
    out=makeSample(rename=selected=selectedMake);
samplingunit make;
run;

/* Select one model from every make */
proc surveyselect data=makeSample sampsize=1 outall 
    out=modelSample(rename=selected=selectedModel);
strata make;
run;

/* Select the selected model for every selected make */
data sample;
set modelSample;
selected = selectedMake and selectedModel;
run;

Example assumes that the data is sorted by make.

 

PG
Ksharp
Super User

Follow what @Reeza said.

 

data have;
 set sashelp.zipcode;
 call streaminit(12345678);
 r=rand('uniform');
run;

proc sort data=have ;
by STATE r;
run;

data want;
 set have;
 by STATE;
 if first.STATE then Random=rand('bern',0.1) ;
 drop r;
run;


hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1489 views
  • 0 likes
  • 4 in conversation