BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasuser3
Calcite | Level 5

Hi,

I want to divide a dataset of 31 subjects into 3 samples:10,10, &11; each sample should have 50% males and females (by variable gender).  No subject should be replicated in any sample.  I used surveyselect proc as below, but  same subject sometimes appears in more than one sample and some subjects don't appear at all. which I don't want to happen.  How to avoid this situation?  Thanks.

proc surveyselect data = DIABLIB.DiabDOESet out = DOEsamp1

     method = srs samprate = .33 rep=3 ;

strata gender;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

borrowed Robby's data:

data temp;

set strat_people;

n=ranuni(88);

proc sort;

by gender n;

run;

   data data1 data2 data3;

  set temp;

  if mod(_n_,3)=0 then output data1;

    else if mod(_n_,3)=1 then output data2;

      else output data3;

run;

Linlin

Message was edited by: Linlin

View solution in original post

18 REPLIES 18
PaigeMiller
Diamond | Level 26

I hope you realize that the dataset which has 11 subjects cannot have 50% males

This isn't too hard to do using data steps and PROC SORT. Assign random numbers to everyone. Sort the males by the random number. Sort the females by the random number. Assign the first 5 males and first 5 females to sample 1. Continue. Done.

--
Paige Miller
sasuser3
Calcite | Level 5

Yes, I know 11 is an odd numnber.

Thanks for your solution.  It works; but I wanted to know if there is any proc to create such design of experiment.

Thank you.

robby_beum
Quartz | Level 8

Hi sasuser3,

please forgive my ignorance - are you saying that you have a working set of code but you're just pinging the group to see if there's a proc that will do this for you?

sasuser3
Calcite | Level 5

At first I didn't have solution as I was trying to use proc surveyselect (the code is written in my question).  I was looking for a proc (with appropriate options) that can generate a design of experiment, something randomized block with proportions or stratified with proportions...  

Later I followed PaigeMiller's solution and made it work; but I am still looking for a proc if it exists for this situation.

I have not yet tried Robby_Beum's solution, but thanks to him.

My working code is:

** generate random numbers for each ID;

data DOESet;

if _n_=1 then do;

   **----urand will be your random integer----**;

   urand=0;

   call ranuni(urand,dummy); **get a starting seed;

   put "original seed = " urand; **"save" starting seed to log;

   retain urand ;

end;

set DIABLIB.DiabDOESet;

call ranuni(urand,dummy);

drop dummy;

run;

proc sort data = DOESet;

   by gender;

run;

Dorota_Jarosz
Obsidian | Level 7

Try  METHOD=PPS in the procedure statement options, as described in SAS documentation:

http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_surveyselect...

This is what comes first when searching by keywords "surveyselect without replacement."

sasuser3
Calcite | Level 5

Thanks, but it doesn't fit to my situation.  Without replication is within a sample, not among samples.   I again tried, but I couldn't get it.

Reeza
Super User

You can also use PROC RANK with a random variable, a touch easier than a dataset I believe.

DATA strat_people ;

INPUT participant gender $1. ;

CARDS ;

1 M

2 F

3 M

4 F

5 M

6 F

7 M

8 F

9 M

10 F

11 M

12 F

13 M

14 F

15 M

16 F

17 M

18 F

19 M

20 F

21 M

22 F

23 M

24 F

25 M

26 F

27 M

28 F

29 M

30 F

31 M

;

data strat_people;

    set strat_people;

    random=ranuni(88);

run;

proc sort data=strat_people; by gender random; run;

proc rank data=strat_people out=results groups=3;

    by gender;

    var random;

    ranks randomgroup;

run;

PaigeMiller
Diamond | Level 26

Fareeza Khurshed wrote:

You can also use PROC RANK with a random variable, a touch easier than a dataset I believe.

DATA strat_people ;

INPUT participant gender $1. ;

CARDS ;

1 M

2 F

3 M

4 F

5 M

6 F

7 M

8 F

9 M

10 F

11 M

12 F

13 M

14 F

15 M

16 F

17 M

18 F

19 M

20 F

21 M

22 F

23 M

24 F

25 M

26 F

27 M

28 F

29 M

30 F

31 M

;

data strat_people;

    set strat_people;

    random=ranuni(88);

run;

proc sort data=strat_people; by gender random; run;

proc rank data=strat_people out=results groups=3;

    by gender;

    var random;

    ranks randomgroup;

run;

You don't need two data steps to begin the program, this can be accomplished in a single data step.

--
Paige Miller
antony_allianz
Fluorite | Level 6

Thank you for your volunteering suggestion. I'm growing in my knowledge like your precious suggestions. pls keep in touch...

robby_beum
Quartz | Level 8

I wrtote it in EG 4.3. - it's wordy but it works...

DATA strat_people ;
INPUT participant gender $1. ;
CARDS ;
1 M
2 F
3 M
4 F
5 M
6 F
7 M
8 F
9 M
10 F
11 M
12 F
13 M
14 F
15 M
16 F
17 M
18 F
19 M
20 F
21 M
22 F
23 M
24 F
25 M
26 F
27 M
28 F
29 M
30 F
31 M
;

/***************************/
/* Define the sample sizes */
/***************************/
%let k=10;
%let k2=20;

/***********************************************************/
/* There are 31 participants so we need to split out into  */
/* 3 datasets of 10, 10 and 11 with a 50% male and 50%     */
/* female into each table (except the last since it's odd) */
/***********************************************************/
%macro looptest;
%do %until (&count = 10);

   /* GENERATE A RANDOM VECTOR */
   data strat_people_1;
     SET strat_people;
     random=RANUNI(-1);
  count=1;
   run;

   /* SORT OBSERVATIONS BY THE RANDOM VECTOR */
   proc sort DATA=strat_people_1;
     BY random;
   run;

   /* SELECT THE FIRST K OBSERVATIONS */
   data controla controlb controlc;
      SET strat_people_1(drop=random);
      rollup_var=1;

      IF _N_ le &k then
         do;
            if gender='M' then male+1;
            else female+1;
   if male<=5 and female<=5 then output controla;
   end;
   ELSE IF _N_ gt &k and _N_ le &k2 then
           do;
                 if gender='M' then male+1;
                 else female+1;
                 if male<=10 and female<=10 then output controlb;
        end;
      ELSE output controlc; 
   run;

   proc sql;
      create table controla_1 as
      select rollup_var,
             sum(count) as total_participants
      from controla
      group by rollup_var;
   quit;

   data _null_;
      set controla_1;
      call symput('count',put(total_participants,3.));
   run;

   %put "&count";
%end;
%mend looptest;

%looptest;

%macro print(value);
   proc export data=&value.(keep=participant gender)
      outfile="C:\directories\output\Randomize_People.xls"
      dbms=excelcs replace;
      sheet="&value";
      SERVER='rvwsascpt01';
      PORT=9621;
   run;
%mend print;

%print(controla);
%print(controlb);
%print(controlc);

PaigeMiller
Diamond | Level 26

I could envision a much simpler program that doesn't require any macros at all, just a data step, two PROC SORT steps, and another data step.

--
Paige Miller
Linlin
Lapis Lazuli | Level 10

borrowed Robby's data:

data temp;

set strat_people;

n=ranuni(88);

proc sort;

by gender n;

run;

   data data1 data2 data3;

  set temp;

  if mod(_n_,3)=0 then output data1;

    else if mod(_n_,3)=1 then output data2;

      else output data3;

run;

Linlin

Message was edited by: Linlin

PaigeMiller
Diamond | Level 26

Well, Linlin, nice job. Shame on me, I thought it would take two PROC SORTs ... duh ... but you use two datasteps after the SORT, I'm sure you could do it in one datastep

--
Paige Miller
robby_beum
Quartz | Level 8

HEY! I provided the data!

;o)

Nice job Linlin!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 18 replies
  • 10469 views
  • 5 likes
  • 7 in conversation