BookmarkSubscribeRSS Feed
slobber
Calcite | Level 5

I am trying to generate a 1% random sample from my data set, but I only want 1 of each distinct value in a variable. For example, if I have a set that looks like:

 

Company Name        Project Number        Joint Company

CompA                       553                          CompX

CompA                       552                          CompY

CompB                       133                          CompZ

 

In my random sample, I would only want CompA chosen once, not multiple times due to being in multiple rows. So, I need a 1% random sample using variable CompName. In addition, I do not want it to produce a new data set. Instead, I want a new variable created (let's say, "Random"), where a value of 1 means that row was selected for the random sample, and '.' or '0' means it was not selected.

 

Does that make sense? I have tried both proc surveyselect and proc sql and cannot 1) get distinct values instead of duplicates and 2) generate a new column in the same data set (or even a new data set that has all the original variables in it). 

 

Thank you for your help!

4 REPLIES 4
Reeza
Super User

Add a random number to your data. 

Sort data by company, random number

Create a random Bernoulli variable that has a 1% probability at the start of each company. 

 

 

 

slobber
Calcite | Level 5

I'm not sure what you mean by adding a random number to my data?

PGStats
Opal | Level 21

Combine cluster sampling with stratified sampling. For example, to get a sample of 10 car models, but never more than one per make:

 

/* Select 10 makes */
proc surveyselect data=sashelp.cars sampsize=10 outall
    out=makeSample(rename=selected=selectedMake);
samplingunit make;
run;

/* Select one model from every make */
proc surveyselect data=makeSample sampsize=1 outall 
    out=modelSample(rename=selected=selectedModel);
strata make;
run;

/* Select the selected model for every selected make */
data sample;
set modelSample;
selected = selectedMake and selectedModel;
run;

Example assumes that the data is sorted by make.

 

PG
Ksharp
Super User

Follow what @Reeza said.

 

data have;
 set sashelp.zipcode;
 call streaminit(12345678);
 r=rand('uniform');
run;

proc sort data=have ;
by STATE r;
run;

data want;
 set have;
 by STATE;
 if first.STATE then Random=rand('bern',0.1) ;
 drop r;
run;


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 712 views
  • 0 likes
  • 4 in conversation