BookmarkSubscribeRSS Feed
lavernal
Calcite | Level 5

Hi,

I am trying to perform oversampling on my dataset (~200,000 observations), which consist of a flag variable of value 1 or 0. I want 100 samples with each sample to contain all the observations with flag=1 (~100 of them in total) and then randomly select ~750 of observations that have flag = 0. However, I seem to have some difficulty in getting what I want. I ended up with ~10,000 observations in total for each sample. And sometimes, my code takes forever to run. Can someone advice me on what is wrong?

My code is as follows:

data oversamples;

do sample=1 to 100;

        set fun;

        do i = 1 to _N_;

        if flag= 1 or (flag=0 and ranuni(sample+7320) < 0.003) then output;

        end;

    end;

run;

Thanks for any advice.

4 REPLIES 4
VD
Calcite | Level 5 VD
Calcite | Level 5

I think your set fun statement should be before the do loop.

akberali67
Calcite | Level 5

Why dont you try PROC SURVEYSELECT, it picks random samples.

proc surveyselect data=data_set

   method=srs n=100 out=data_random;

run;

lavernal
Calcite | Level 5

Hi akberali,

My output is now really random, I want it to include all elements with flag=1 and then randomly select ~800 elements with flag=0. Is it possible to create that using the proc surveyselect?

Thanks for the advice

Rick_SAS
SAS Super FREQ

Use the STRATA option and set the sample rate for the "rare event category" to be 100%. See http://www.nesug.org/proceedings/nesug07/sa/sa02.pdf, beginning at the bottom of page 2.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2156 views
  • 1 like
  • 4 in conversation