Solved: how to divide dataset into 3 samples,each one with 50% males & females

sasuser3 · Posted 07-26-2012 11:12 AM

Hi,

I want to divide a dataset of 31 subjects into 3 samples:10,10, &11; each sample should have 50% males and females (by variable gender). No subject should be replicated in any sample. I used surveyselect proc as below, but same subject sometimes appears in more than one sample and some subjects don't appear at all. which I don't want to happen. How to avoid this situation? Thanks.

proc surveyselect data = DIABLIB.DiabDOESet out = DOEsamp1

method = srs samprate = .33 rep=3 ;

strata gender;

run;

Linlin · Posted 07-26-2012 03:16 PM

borrowed Robby's data:

data temp;

set strat_people;

n=ranuni(88);

proc sort;

by gender n;

run;

data data1 data2 data3;

set temp;

if mod(_n_,3)=0 then output data1;

else if mod(_n_,3)=1 then output data2;

else output data3;

run;

Linlin

Message was edited by: Linlin

View solution in original post

PaigeMiller · Posted 07-26-2012 11:28 AM

I hope you realize that the dataset which has 11 subjects cannot have 50% males

This isn't too hard to do using data steps and PROC SORT. Assign random numbers to everyone. Sort the males by the random number. Sort the females by the random number. Assign the first 5 males and first 5 females to sample 1. Continue. Done.

--
Paige Miller

sasuser3 · Posted 07-26-2012 12:00 PM

Yes, I know 11 is an odd numnber.

Thanks for your solution. It works; but I wanted to know if there is any proc to create such design of experiment.

Thank you.

robby_beum · Posted 07-26-2012 01:38 PM

Hi sasuser3,

please forgive my ignorance - are you saying that you have a working set of code but you're just pinging the group to see if there's a proc that will do this for you?

sasuser3 · Posted 07-26-2012 03:36 PM

At first I didn't have solution as I was trying to use proc surveyselect (the code is written in my question). I was looking for a proc (with appropriate options) that can generate a design of experiment, something randomized block with proportions or stratified with proportions...

Later I followed PaigeMiller's solution and made it work; but I am still looking for a proc if it exists for this situation.

I have not yet tried Robby_Beum's solution, but thanks to him.

My working code is:

** generate random numbers for each ID;

data DOESet;

if _n_=1 then do;

**----urand will be your random integer----**;

urand=0;

call ranuni(urand,dummy); **get a starting seed;

put "original seed = " urand; **"save" starting seed to log;

retain urand ;

end;

set DIABLIB.DiabDOESet;

call ranuni(urand,dummy);

drop dummy;

run;

proc sort data = DOESet;

by gender;

run;

Dorota_Jarosz · Posted 07-26-2012 03:59 PM

Try METHOD=PPS in the procedure statement options, as described in SAS documentation:

http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_surveyselect...

This is what comes first when searching by keywords "surveyselect without replacement."

sasuser3 · Posted 07-26-2012 05:19 PM

Thanks, but it doesn't fit to my situation. Without replication is within a sample, not among samples. I again tried, but I couldn't get it.

Reeza · Posted 07-26-2012 05:55 PM

You can also use PROC RANK with a random variable, a touch easier than a dataset I believe.

DATA strat_people ;

INPUT participant gender $1. ;

CARDS ;

1 M

2 F

3 M

4 F

5 M

6 F

7 M

8 F

9 M

10 F

11 M

12 F

13 M

14 F

15 M

16 F

17 M

18 F

19 M

20 F

21 M

22 F

23 M

24 F

25 M

26 F

27 M

28 F

29 M

30 F

31 M

;

data strat_people;

set strat_people;

random=ranuni(88);

run;

proc sort data=strat_people; by gender random; run;

proc rank data=strat_people out=results groups=3;

by gender;

var random;

ranks randomgroup;

run;

PaigeMiller · Posted 07-27-2012 07:50 AM

Fareeza Khurshed wrote:

You can also use PROC RANK with a random variable, a touch easier than a dataset I believe.

DATA strat_people ;
INPUT participant gender $1. ;
CARDS ;
1 M
2 F
3 M
4 F
5 M
6 F
7 M
8 F
9 M
10 F
11 M
12 F
13 M
14 F
15 M
16 F
17 M
18 F
19 M
20 F
21 M
22 F
23 M
24 F
25 M
26 F
27 M
28 F
29 M
30 F
31 M
;

data strat_people;
    set strat_people;
    random=ranuni(88);
run;

proc sort data=strat_people; by gender random; run;


proc rank data=strat_people out=results groups=3;
    by gender;
    var random;
    ranks randomgroup;
run;

You don't need two data steps to begin the program, this can be accomplished in a single data step.

--
Paige Miller

antony_allianz · Posted 07-27-2012 08:00 AM

Thank you for your volunteering suggestion. I'm growing in my knowledge like your precious suggestions. pls keep in touch...

robby_beum · Posted 07-26-2012 02:53 PM

I wrtote it in EG 4.3. - it's wordy but it works...

DATA strat_people ;
INPUT participant gender $1. ;
CARDS ;
1 M
2 F
3 M
4 F
5 M
6 F
7 M
8 F
9 M
10 F
11 M
12 F
13 M
14 F
15 M
16 F
17 M
18 F
19 M
20 F
21 M
22 F
23 M
24 F
25 M
26 F
27 M
28 F
29 M
30 F
31 M
;

/***************************/
/* Define the sample sizes */
/***************************/
%let k=10;
%let k2=20;

/***********************************************************/
/* There are 31 participants so we need to split out into */
/* 3 datasets of 10, 10 and 11 with a 50% male and 50% */
/* female into each table (except the last since it's odd) */
/***********************************************************/
%macro looptest;
%do %until (&count = 10);

   /* GENERATE A RANDOM VECTOR */
   data strat_people_1;
     SET strat_people;
     random=RANUNI(-1);
count=1;
   run;

   /* SORT OBSERVATIONS BY THE RANDOM VECTOR */
   proc sort DATA=strat_people_1;
     BY random;
   run;

   /* SELECT THE FIRST K OBSERVATIONS */
   data controla controlb controlc;
      SET strat_people_1(drop=random);
      rollup_var=1;

      IF _N_ le &k then
         do;
            if gender='M' then male+1;
            else female+1;
   if male<=5 and female<=5 then output controla;
   end;
   ELSE IF _N_ gt &k and _N_ le &k2 then
           do;
                 if gender='M' then male+1;
                 else female+1;
                 if male<=10 and female<=10 then output controlb;
        end;
      ELSE output controlc;
   run;

   proc sql;
      create table controla_1 as
      select rollup_var,
             sum(count) as total_participants
      from controla
      group by rollup_var;
   quit;

   data _null_;
      set controla_1;
      call symput('count',put(total_participants,3.));
   run;

%put "&count";
%end;
%mend looptest;

%looptest;

%macro print(value);
   proc export data=&value.(keep=participant gender)
      outfile="C:\directories\output\Randomize_People.xls"
      dbms=excelcs replace;
      sheet="&value";
      SERVER='rvwsascpt01';
      PORT=9621;
   run;
%mend print;

%print(controla);
%print(controlb);
%print(controlc);

PaigeMiller · Posted 07-26-2012 03:00 PM

I could envision a much simpler program that doesn't require any macros at all, just a data step, two PROC SORT steps, and another data step.

--
Paige Miller

Linlin · Posted 07-26-2012 03:16 PM

borrowed Robby's data:

data temp;

set strat_people;

n=ranuni(88);

proc sort;

by gender n;

run;

data data1 data2 data3;

set temp;

if mod(_n_,3)=0 then output data1;

else if mod(_n_,3)=1 then output data2;

else output data3;

run;

Linlin

Message was edited by: Linlin

PaigeMiller · Posted 07-26-2012 03:18 PM

Well, Linlin, nice job. Shame on me, I thought it would take two PROC SORTs ... duh ... but you use two datasteps after the SORT, I'm sure you could do it in one datastep

--
Paige Miller

robby_beum · Posted 07-26-2012 03:24 PM

HEY! I provided the data!

;o)

Nice job Linlin!

how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Re: how to divide dataset into 3 samples,each one with 50% males & females

Catch up on SAS Innovate 2026