Help using Base SAS procedures

Random sampling

Reply
Occasional Contributor
Posts: 5

Random sampling

I am trying to generate a 1% random sample from my data set, but I only want 1 of each distinct value in a variable. For example, if I have a set that looks like:

 

Company Name        Project Number        Joint Company

CompA                       553                          CompX

CompA                       552                          CompY

CompB                       133                          CompZ

 

In my random sample, I would only want CompA chosen once, not multiple times due to being in multiple rows. So, I need a 1% random sample using variable CompName. In addition, I do not want it to produce a new data set. Instead, I want a new variable created (let's say, "Random"), where a value of 1 means that row was selected for the random sample, and '.' or '0' means it was not selected.

 

Does that make sense? I have tried both proc surveyselect and proc sql and cannot 1) get distinct values instead of duplicates and 2) generate a new column in the same data set (or even a new data set that has all the original variables in it). 

 

Thank you for your help!

Super User
Posts: 23,700

Re: Random sampling

Add a random number to your data. 

Sort data by company, random number

Create a random Bernoulli variable that has a 1% probability at the start of each company. 

 

 

 

Occasional Contributor
Posts: 5

Re: Random sampling

I'm not sure what you mean by adding a random number to my data?

Esteemed Advisor
Posts: 5,526

Re: Random sampling

Combine cluster sampling with stratified sampling. For example, to get a sample of 10 car models, but never more than one per make:

 

/* Select 10 makes */
proc surveyselect data=sashelp.cars sampsize=10 outall
    out=makeSample(rename=selected=selectedMake);
samplingunit make;
run;

/* Select one model from every make */
proc surveyselect data=makeSample sampsize=1 outall 
    out=modelSample(rename=selected=selectedModel);
strata make;
run;

/* Select the selected model for every selected make */
data sample;
set modelSample;
selected = selectedMake and selectedModel;
run;

Example assumes that the data is sorted by make.

 

PG
Super User
Posts: 10,770

Re: Random sampling

Follow what @Reeza said.

 

data have;
 set sashelp.zipcode;
 call streaminit(12345678);
 r=rand('uniform');
run;

proc sort data=have ;
by STATE r;
run;

data want;
 set have;
 by STATE;
 if first.STATE then Random=rand('bern',0.1) ;
 drop r;
run;


Ask a Question
Discussion stats
  • 4 replies
  • 162 views
  • 0 likes
  • 4 in conversation