How can I do a balanced sample: example
I have a data with 1000 clients, being of these, 300 are not good and 700 are good (binary, independent variable), and I need to do a proporcional balanced sample, as if I divide 50% good and 50% bad...somebody can help me
Actually, you could take all of the 30% sample and then some of the other sample until the N's match, that's should be relatively easy in a data step.
*create sample data - with a random number;
data have;
call streaminit(25);
*set stream for random number - means same results every time;
group=1;
do i=1 to 30;
x=rand('normal', 25, 5);
output;
end;
group=2;
do i=1 to 70;
x=rand('normal', 35, 2);
output;
end;
run;
*sort with random number to randomize;
proc sort data=have;
by group x;
run;
data want;
set have;
by group;
retain counter flag max_count;
if _n_=1 then
flag=0;
if first.group then
counter=0;
counter+1;
if last.group and flag=0 then
do;
flag=1;
max_count=counter;
end;
if flag=1 and counter> max_count then
stop;;
run;
Randomly select X from the not good, and X from the good.
How can I do this in SAS Enterprise Guide by proc, can you help me?
I did this in SAS Enterprise Miner...but my sample was not good
I don't see a task that would do this in SAS EG 7.12
I think you need a code node and PROC SURVEYSELECT
You could possibly do this in a query by first calculating the numbers needed and then using a random number generated to select that many from each group.
Why not set priors in your regression instead?
I did this in SAS Enterprise Miner...but my sample was not good, some suggestions?
my sample was not good
Your sample or prediction wasn't 'good'? What do you mean by that?
my prediction was not good
Actually, you could take all of the 30% sample and then some of the other sample until the N's match, that's should be relatively easy in a data step.
*create sample data - with a random number;
data have;
call streaminit(25);
*set stream for random number - means same results every time;
group=1;
do i=1 to 30;
x=rand('normal', 25, 5);
output;
end;
group=2;
do i=1 to 70;
x=rand('normal', 35, 2);
output;
end;
run;
*sort with random number to randomize;
proc sort data=have;
by group x;
run;
data want;
set have;
by group;
retain counter flag max_count;
if _n_=1 then
flag=0;
if first.group then
counter=0;
counter+1;
if last.group and flag=0 then
do;
flag=1;
max_count=counter;
end;
if flag=1 and counter> max_count then
stop;;
run;
Goog Morning,
I tried and it worked out. Thanks a lot
my prediction, ROC was very low...I need to do a random before by proc...understand? I try to do with proc surveyselect, thanks a lot. bye
that's ok I try. tks
Are you using German Credit.xlsx for making a CreditCard ?
proc import datafile="/courses/d8fb3215ba27fe300/1--German Credit.xlsx"
out=have dbms=xlsx replace;
run;
proc sort data=have;by good_bad;run;
proc surveyselect data=have out=want sampsize=(300 300) seed=12345678;
strata good_bad;
run;
proc freq data=want;
table good_bad;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.