BookmarkSubscribeRSS Feed
slobber
Calcite | Level 5

I am trying to generate a 1% random sample from my data set, but I only want 1 of each distinct value in a variable. For example, if I have a set that looks like:

 

Company Name        Project Number        Joint Company

CompA                       553                          CompX

CompA                       552                          CompY

CompB                       133                          CompZ

 

In my random sample, I would only want CompA chosen once, not multiple times due to being in multiple rows. So, I need a 1% random sample using variable CompName. In addition, I do not want it to produce a new data set. Instead, I want a new variable created (let's say, "Random"), where a value of 1 means that row was selected for the random sample, and '.' or '0' means it was not selected.

 

Does that make sense? I have tried both proc surveyselect and proc sql and cannot 1) get distinct values instead of duplicates and 2) generate a new column in the same data set (or even a new data set that has all the original variables in it). 

 

Thank you for your help!

4 REPLIES 4
Reeza
Super User

Add a random number to your data. 

Sort data by company, random number

Create a random Bernoulli variable that has a 1% probability at the start of each company. 

 

 

 

slobber
Calcite | Level 5

I'm not sure what you mean by adding a random number to my data?

PGStats
Opal | Level 21

Combine cluster sampling with stratified sampling. For example, to get a sample of 10 car models, but never more than one per make:

 

/* Select 10 makes */
proc surveyselect data=sashelp.cars sampsize=10 outall
    out=makeSample(rename=selected=selectedMake);
samplingunit make;
run;

/* Select one model from every make */
proc surveyselect data=makeSample sampsize=1 outall 
    out=modelSample(rename=selected=selectedModel);
strata make;
run;

/* Select the selected model for every selected make */
data sample;
set modelSample;
selected = selectedMake and selectedModel;
run;

Example assumes that the data is sorted by make.

 

PG
Ksharp
Super User

Follow what @Reeza said.

 

data have;
 set sashelp.zipcode;
 call streaminit(12345678);
 r=rand('uniform');
run;

proc sort data=have ;
by STATE r;
run;

data want;
 set have;
 by STATE;
 if first.STATE then Random=rand('bern',0.1) ;
 drop r;
run;


SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 626 views
  • 0 likes
  • 4 in conversation