Fluorite | Level 6

BALANCED SAMPLE

How can I do a balanced sample: example

I have a data with 1000 clients, being of these, 300 are not good and 700 are  good (binary, independent variable), and I need to do a proporcional balanced sample, as if I divide  50% good and 50% bad...somebody can help me

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

Re: BALANCED SAMPLE

Actually, you could take all of the 30% sample and then some of the other sample until the N's match, that's should be relatively easy in a data step.

``````*create sample data - with  a random number;
data have;
call streaminit(25);

*set stream for random number - means same results every time;
group=1;

do i=1 to 30;
x=rand('normal', 25, 5);
output;
end;

group=2;

do i=1 to 70;
x=rand('normal', 35, 2);
output;
end;
run;

*sort with random number to randomize;
proc sort data=have;
by group x;
run;

data want;
set have;
by group;
retain counter flag max_count;

if _n_=1 then
flag=0;

if first.group then
counter=0;
counter+1;

if last.group and flag=0 then
do;
flag=1;
max_count=counter;
end;

if flag=1 and counter> max_count then
stop;;
run;``````

13 REPLIES 13
Diamond | Level 26

Re: BALANCED SAMPLE

Randomly select X from the not good, and X from the good.

--
Paige Miller
Fluorite | Level 6

Re: BALANCED SAMPLE

How can I do this in SAS Enterprise Guide by proc, can you help me?

Fluorite | Level 6

Re: BALANCED SAMPLE

I did this in SAS Enterprise Miner...but my sample was not good

Super User

Re: BALANCED SAMPLE

I don't see a task that would do this in SAS EG 7.12

I think you need a code node and PROC SURVEYSELECT

You could possibly do this in a query by first calculating the numbers needed and then using a random number generated to select that many from each group.

Super User

Re: BALANCED SAMPLE

Fluorite | Level 6

Re: BALANCED SAMPLE

I did this in SAS Enterprise Miner...but my sample was not good, some suggestions?

Super User

Re: BALANCED SAMPLE

my sample was not good

Your sample or prediction wasn't 'good'? What do you mean by that?

Fluorite | Level 6

Re: BALANCED SAMPLE

my prediction was not good

Super User

Re: BALANCED SAMPLE

Actually, you could take all of the 30% sample and then some of the other sample until the N's match, that's should be relatively easy in a data step.

``````*create sample data - with  a random number;
data have;
call streaminit(25);

*set stream for random number - means same results every time;
group=1;

do i=1 to 30;
x=rand('normal', 25, 5);
output;
end;

group=2;

do i=1 to 70;
x=rand('normal', 35, 2);
output;
end;
run;

*sort with random number to randomize;
proc sort data=have;
by group x;
run;

data want;
set have;
by group;
retain counter flag max_count;

if _n_=1 then
flag=0;

if first.group then
counter=0;
counter+1;

if last.group and flag=0 then
do;
flag=1;
max_count=counter;
end;

if flag=1 and counter> max_count then
stop;;
run;``````

Fluorite | Level 6

Re: BALANCED SAMPLE

Goog Morning,

I tried and it worked out. Thanks a lot

Fluorite | Level 6

Re: BALANCED SAMPLE

my prediction, ROC was very low...I need to do a random before by proc...understand? I try to do with proc surveyselect, thanks a lot. bye

Fluorite | Level 6

Re: BALANCED SAMPLE

that's ok I try. tks

Super User

Re: BALANCED SAMPLE

Are you using German Credit.xlsx for making a CreditCard ?

``````proc import datafile="/courses/d8fb3215ba27fe300/1--German Credit.xlsx"
out=have dbms=xlsx replace;
run;