BookmarkSubscribeRSS Feed
lavernal
Calcite | Level 5

Hi,

I am trying to perform oversampling on my dataset (~200,000 observations), which consist of a flag variable of value 1 or 0. I want 100 samples with each sample to contain all the observations with flag=1 (~100 of them in total) and then randomly select ~750 of observations that have flag = 0. However, I seem to have some difficulty in getting what I want. I ended up with ~10,000 observations in total for each sample. And sometimes, my code takes forever to run. Can someone advice me on what is wrong?

My code is as follows:

data oversamples;

do sample=1 to 100;

        set fun;

        do i = 1 to _N_;

        if flag= 1 or (flag=0 and ranuni(sample+7320) < 0.003) then output;

        end;

    end;

run;

Thanks for any advice.

4 REPLIES 4
VD
Calcite | Level 5 VD
Calcite | Level 5

I think your set fun statement should be before the do loop.

akberali67
Calcite | Level 5

Why dont you try PROC SURVEYSELECT, it picks random samples.

proc surveyselect data=data_set

   method=srs n=100 out=data_random;

run;

lavernal
Calcite | Level 5

Hi akberali,

My output is now really random, I want it to include all elements with flag=1 and then randomly select ~800 elements with flag=0. Is it possible to create that using the proc surveyselect?

Thanks for the advice

Rick_SAS
SAS Super FREQ

Use the STRATA option and set the sample rate for the "rare event category" to be 100%. See http://www.nesug.org/proceedings/nesug07/sa/sa02.pdf, beginning at the bottom of page 2.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1842 views
  • 1 like
  • 4 in conversation