Random sampling

slobber · Posted 04-30-2018 10:30 PM

I am trying to generate a 1% random sample from my data set, but I only want 1 of each distinct value in a variable. For example, if I have a set that looks like:

Company Name Project Number Joint Company

CompA 553 CompX

CompA 552 CompY

CompB 133 CompZ

In my random sample, I would only want CompA chosen once, not multiple times due to being in multiple rows. So, I need a 1% random sample using variable CompName. In addition, I do not want it to produce a new data set. Instead, I want a new variable created (let's say, "Random"), where a value of 1 means that row was selected for the random sample, and '.' or '0' means it was not selected.

Does that make sense? I have tried both proc surveyselect and proc sql and cannot 1) get distinct values instead of duplicates and 2) generate a new column in the same data set (or even a new data set that has all the original variables in it).

Thank you for your help!

Reeza · Posted 04-30-2018 11:24 PM

Add a random number to your data.

Sort data by company, random number

Create a random Bernoulli variable that has a 1% probability at the start of each company.

slobber · Posted 04-30-2018 11:27 PM

I'm not sure what you mean by adding a random number to my data?

PGStats · Posted 05-01-2018 12:15 AM

Combine cluster sampling with stratified sampling. For example, to get a sample of 10 car models, but never more than one per make:

/* Select 10 makes */
proc surveyselect data=sashelp.cars sampsize=10 outall
    out=makeSample(rename=selected=selectedMake);
samplingunit make;
run;

/* Select one model from every make */
proc surveyselect data=makeSample sampsize=1 outall 
    out=modelSample(rename=selected=selectedModel);
strata make;
run;

/* Select the selected model for every selected make */
data sample;
set modelSample;
selected = selectedMake and selectedModel;
run;

Example assumes that the data is sorted by make.

PG

Ksharp · Posted 05-01-2018 04:43 AM

Follow what @Reeza said.

data have;
 set sashelp.zipcode;
 call streaminit(12345678);
 r=rand('uniform');
run;

proc sort data=have ;
by STATE r;
run;

data want;
 set have;
 by STATE;
 if first.STATE then Random=rand('bern',0.1) ;
 drop r;
run;

Random sampling

Re: Random sampling

Re: Random sampling

Re: Random sampling

Re: Random sampling

Random sampling

Re: Random sampling

Re: Random sampling

Re: Random sampling

Re: Random sampling

Registration is open

SAS Training: Just a Click Away