Hi,
I am trying to perform oversampling on my dataset (~200,000 observations), which consist of a flag variable of value 1 or 0. I want 100 samples with each sample to contain all the observations with flag=1 (~100 of them in total) and then randomly select ~750 of observations that have flag = 0. However, I seem to have some difficulty in getting what I want. I ended up with ~10,000 observations in total for each sample. And sometimes, my code takes forever to run. Can someone advice me on what is wrong?
My code is as follows:
data oversamples;
do sample=1 to 100;
set fun;
do i = 1 to _N_;
if flag= 1 or (flag=0 and ranuni(sample+7320) < 0.003) then output;
end;
end;
run;
Thanks for any advice.
I think your set fun statement should be before the do loop.
Why dont you try PROC SURVEYSELECT, it picks random samples.
proc surveyselect data=data_set
method=srs n=100 out=data_random;
run;
Hi akberali,
My output is now really random, I want it to include all elements with flag=1 and then randomly select ~800 elements with flag=0. Is it possible to create that using the proc surveyselect?
Thanks for the advice
Use the STRATA option and set the sample rate for the "rare event category" to be 100%. See http://www.nesug.org/proceedings/nesug07/sa/sa02.pdf, beginning at the bottom of page 2.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.