## how to divide dataset into 3 samples,each one with 50% males & females

Solved
Occasional Contributor
Posts: 13

# how to divide dataset into 3 samples,each one with 50% males & females

Hi,

I want to divide a dataset of 31 subjects into 3 samples:10,10, &11; each sample should have 50% males and females (by variable gender).  No subject should be replicated in any sample.  I used surveyselect proc as below, but  same subject sometimes appears in more than one sample and some subjects don't appear at all. which I don't want to happen.  How to avoid this situation?  Thanks.

proc surveyselect data = DIABLIB.DiabDOESet out = DOEsamp1

method = srs samprate = .33 rep=3 ;

strata gender;

run;

Accepted Solutions
Solution
‎07-26-2012 03:16 PM
Super Contributor
Posts: 1,636

## Re: how to divide dataset into 3 samples,each one with 50% males & females

borrowed Robby's data:

data temp;

set strat_people;

n=ranuni(88);

proc sort;

by gender n;

run;

data data1 data2 data3;

set temp;

if mod(_n_,3)=0 then output data1;

else if mod(_n_,3)=1 then output data2;

else output data3;

run;

Linlin

Message was edited by: Linlin

All Replies
Posts: 2,044

## Re: how to divide dataset into 3 samples,each one with 50% males & females

I hope you realize that the dataset which has 11 subjects cannot have 50% males

This isn't too hard to do using data steps and PROC SORT. Assign random numbers to everyone. Sort the males by the random number. Sort the females by the random number. Assign the first 5 males and first 5 females to sample 1. Continue. Done.

--
Paige Miller
Occasional Contributor
Posts: 13

## Re: how to divide dataset into 3 samples,each one with 50% males & females

Yes, I know 11 is an odd numnber.

Thanks for your solution.  It works; but I wanted to know if there is any proc to create such design of experiment.

Thank you.

Frequent Contributor
Posts: 75

## Re: how to divide dataset into 3 samples,each one with 50% males & females

Hi sasuser3,

please forgive my ignorance - are you saying that you have a working set of code but you're just pinging the group to see if there's a proc that will do this for you?

Occasional Contributor
Posts: 13

## Re: how to divide dataset into 3 samples,each one with 50% males & females

At first I didn't have solution as I was trying to use proc surveyselect (the code is written in my question).  I was looking for a proc (with appropriate options) that can generate a design of experiment, something randomized block with proportions or stratified with proportions...

Later I followed PaigeMiller's solution and made it work; but I am still looking for a proc if it exists for this situation.

I have not yet tried Robby_Beum's solution, but thanks to him.

My working code is:

** generate random numbers for each ID;

data DOESet;

if _n_=1 then do;

**----urand will be your random integer----**;

urand=0;

call ranuni(urand,dummy); **get a starting seed;

put "original seed = " urand; **"save" starting seed to log;

retain urand ;

end;

set DIABLIB.DiabDOESet;

call ranuni(urand,dummy);

drop dummy;

run;

proc sort data = DOESet;

by gender;

run;

Contributor
Posts: 66

## Re: how to divide dataset into 3 samples,each one with 50% males & females

Try  METHOD=PPS in the procedure statement options, as described in SAS documentation:

http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_surveyselect...

This is what comes first when searching by keywords "surveyselect without replacement."

Occasional Contributor
Posts: 13

## Re: how to divide dataset into 3 samples,each one with 50% males & females

Thanks, but it doesn't fit to my situation.  Without replication is within a sample, not among samples.   I again tried, but I couldn't get it.

Super User
Posts: 20,688

## Re: how to divide dataset into 3 samples,each one with 50% males & females

You can also use PROC RANK with a random variable, a touch easier than a dataset I believe.

DATA strat_people ;

INPUT participant gender \$1. ;

CARDS ;

1 M

2 F

3 M

4 F

5 M

6 F

7 M

8 F

9 M

10 F

11 M

12 F

13 M

14 F

15 M

16 F

17 M

18 F

19 M

20 F

21 M

22 F

23 M

24 F

25 M

26 F

27 M

28 F

29 M

30 F

31 M

;

data strat_people;

set strat_people;

random=ranuni(88);

run;

proc sort data=strat_people; by gender random; run;

proc rank data=strat_people out=results groups=3;

by gender;

var random;

ranks randomgroup;

run;

Posts: 2,044

## Re: how to divide dataset into 3 samples,each one with 50% males & females

Fareeza Khurshed wrote:

You can also use PROC RANK with a random variable, a touch easier than a dataset I believe.

DATA strat_people ;

INPUT participant gender \$1. ;

CARDS ;

1 M

2 F

3 M

4 F

5 M

6 F

7 M

8 F

9 M

10 F

11 M

12 F

13 M

14 F

15 M

16 F

17 M

18 F

19 M

20 F

21 M

22 F

23 M

24 F

25 M

26 F

27 M

28 F

29 M

30 F

31 M

;

data strat_people;

set strat_people;

random=ranuni(88);

run;

proc sort data=strat_people; by gender random; run;

proc rank data=strat_people out=results groups=3;

by gender;

var random;

ranks randomgroup;

run;

You don't need two data steps to begin the program, this can be accomplished in a single data step.

--
Paige Miller
Occasional Contributor
Posts: 9

## Re: how to divide dataset into 3 samples,each one with 50% males & females

Thank you for your volunteering suggestion. I'm growing in my knowledge like your precious suggestions. pls keep in touch...

Frequent Contributor
Posts: 75

## Re: how to divide dataset into 3 samples,each one with 50% males & females

I wrtote it in EG 4.3. - it's wordy but it works...

DATA strat_people ;
INPUT participant gender \$1. ;
CARDS ;
1 M
2 F
3 M
4 F
5 M
6 F
7 M
8 F
9 M
10 F
11 M
12 F
13 M
14 F
15 M
16 F
17 M
18 F
19 M
20 F
21 M
22 F
23 M
24 F
25 M
26 F
27 M
28 F
29 M
30 F
31 M
;

/***************************/
/* Define the sample sizes */
/***************************/
%let k=10;
%let k2=20;

/***********************************************************/
/* There are 31 participants so we need to split out into  */
/* 3 datasets of 10, 10 and 11 with a 50% male and 50%     */
/* female into each table (except the last since it's odd) */
/***********************************************************/
%macro looptest;
%do %until (&count = 10);

/* GENERATE A RANDOM VECTOR */
data strat_people_1;
SET strat_people;
random=RANUNI(-1);
count=1;
run;

/* SORT OBSERVATIONS BY THE RANDOM VECTOR */
proc sort DATA=strat_people_1;
BY random;
run;

/* SELECT THE FIRST K OBSERVATIONS */
data controla controlb controlc;
SET strat_people_1(drop=random);
rollup_var=1;

IF _N_ le &k then
do;
if gender='M' then male+1;
else female+1;
if male<=5 and female<=5 then output controla;
end;
ELSE IF _N_ gt &k and _N_ le &k2 then
do;
if gender='M' then male+1;
else female+1;
if male<=10 and female<=10 then output controlb;
end;
ELSE output controlc;
run;

proc sql;
create table controla_1 as
select rollup_var,
sum(count) as total_participants
from controla
group by rollup_var;
quit;

data _null_;
set controla_1;
call symput('count',put(total_participants,3.));
run;

%put "&count";
%end;
%mend looptest;

%looptest;

%macro print(value);
proc export data=&value.(keep=participant gender)
outfile="C:\directories\output\Randomize_People.xls"
dbms=excelcs replace;
sheet="&value";
SERVER='rvwsascpt01';
PORT=9621;
run;
%mend print;

%print(controla);
%print(controlb);
%print(controlc);

Posts: 2,044

## Re: how to divide dataset into 3 samples,each one with 50% males & females

I could envision a much simpler program that doesn't require any macros at all, just a data step, two PROC SORT steps, and another data step.

--
Paige Miller
Solution
‎07-26-2012 03:16 PM
Super Contributor
Posts: 1,636

## Re: how to divide dataset into 3 samples,each one with 50% males & females

borrowed Robby's data:

data temp;

set strat_people;

n=ranuni(88);

proc sort;

by gender n;

run;

data data1 data2 data3;

set temp;

if mod(_n_,3)=0 then output data1;

else if mod(_n_,3)=1 then output data2;

else output data3;

run;

Linlin

Message was edited by: Linlin

Posts: 2,044

## Re: how to divide dataset into 3 samples,each one with 50% males & females

Well, Linlin, nice job. Shame on me, I thought it would take two PROC SORTs ... duh ... but you use two datasteps after the SORT, I'm sure you could do it in one datastep

--
Paige Miller
Frequent Contributor
Posts: 75

## Re: how to divide dataset into 3 samples,each one with 50% males & females

HEY! I provided the data!

;o)

Nice job Linlin!

🔒 This topic is solved and locked.