Desktop productivity for business analysts and programmers

BALANCED SAMPLE

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

BALANCED SAMPLE

How can I do a balanced sample: example

 

I have a data with 1000 clients, being of these, 300 are not good and 700 are  good (binary, independent variable), and I need to do a proporcional balanced sample, as if I divide  50% good and 50% bad...somebody can help me

 

 


Accepted Solutions
Solution
‎01-31-2018 05:58 AM
Super User
Posts: 24,026

Re: BALANCED SAMPLE

Posted in reply to PRISCILABRA

Actually, you could take all of the 30% sample and then some of the other sample until the N's match, that's should be relatively easy in a data step. 

*create sample data - with  a random number;
data have;
	call streaminit(25);

	*set stream for random number - means same results every time;
	group=1;

	do i=1 to 30;
		x=rand('normal', 25, 5);
		output;
	end;

	group=2;

	do i=1 to 70;
		x=rand('normal', 35, 2);
		output;
	end;
run;

*sort with random number to randomize;
proc sort data=have;
	by group x;
run;

data want;
	set have;
	by group;
	retain counter flag max_count;

	if _n_=1 then
		flag=0;

	if first.group then
		counter=0;
	counter+1;

	if last.group and flag=0 then
		do;
			flag=1;
			max_count=counter;
		end;

	if flag=1 and counter> max_count then
		stop;;
run;

 

View solution in original post


All Replies
Respected Advisor
Posts: 3,288

Re: BALANCED SAMPLE

Posted in reply to PRISCILABRA

Randomly select X from the not good, and X from the good.

--
Paige Miller
Occasional Contributor
Posts: 10

Re: BALANCED SAMPLE

Posted in reply to PRISCILABRA

How can I do this in SAS Enterprise Guide by proc, can you help me?

Occasional Contributor
Posts: 10

Re: BALANCED SAMPLE

Posted in reply to PRISCILABRA

I did this in SAS Enterprise Miner...but my sample was not good

Super User
Posts: 24,026

Re: BALANCED SAMPLE

Posted in reply to PRISCILABRA

I don't see a task that would do this in SAS EG 7.12

 

I think you need a code node and PROC SURVEYSELECT

 

You could possibly do this in a query by first calculating the numbers needed and then using a random number generated to select that many from each group. 

 

 

Super User
Posts: 24,026

Re: BALANCED SAMPLE

Posted in reply to PRISCILABRA

Why not set priors in your regression instead?

 

Occasional Contributor
Posts: 10

Re: BALANCED SAMPLE

I did this in SAS Enterprise Miner...but my sample was not good, some suggestions?

Super User
Posts: 24,026

Re: BALANCED SAMPLE

Posted in reply to PRISCILABRA

my sample was not good

 

Your sample or prediction wasn't 'good'? What do you mean by that?

Occasional Contributor
Posts: 10

Re: BALANCED SAMPLE

my prediction was not good

Solution
‎01-31-2018 05:58 AM
Super User
Posts: 24,026

Re: BALANCED SAMPLE

Posted in reply to PRISCILABRA

Actually, you could take all of the 30% sample and then some of the other sample until the N's match, that's should be relatively easy in a data step. 

*create sample data - with  a random number;
data have;
	call streaminit(25);

	*set stream for random number - means same results every time;
	group=1;

	do i=1 to 30;
		x=rand('normal', 25, 5);
		output;
	end;

	group=2;

	do i=1 to 70;
		x=rand('normal', 35, 2);
		output;
	end;
run;

*sort with random number to randomize;
proc sort data=have;
	by group x;
run;

data want;
	set have;
	by group;
	retain counter flag max_count;

	if _n_=1 then
		flag=0;

	if first.group then
		counter=0;
	counter+1;

	if last.group and flag=0 then
		do;
			flag=1;
			max_count=counter;
		end;

	if flag=1 and counter> max_count then
		stop;;
run;

 

Occasional Contributor
Posts: 10

Re: BALANCED SAMPLE

Goog Morning,

 

I tried and it worked out. Thanks a lot

Occasional Contributor
Posts: 10

Re: BALANCED SAMPLE

my prediction, ROC was very low...I need to do a random before by proc...understand? I try to do with proc surveyselect, thanks a lot. bye

Occasional Contributor
Posts: 10

Re: BALANCED SAMPLE

that's ok I try. tks

Super User
Posts: 10,850

Re: BALANCED SAMPLE

Posted in reply to PRISCILABRA

Are you using German Credit.xlsx for making a CreditCard ?

 

proc import datafile="/courses/d8fb3215ba27fe300/1--German Credit.xlsx" 
out=have dbms=xlsx replace;
run;
proc sort data=have;by good_bad;run;
proc surveyselect data=have out=want sampsize=(300 300) seed=12345678;
strata good_bad;
run;
proc freq data=want;
table good_bad;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 266 views
  • 0 likes
  • 4 in conversation