Hi,
I want to divide a dataset of 31 subjects into 3 samples:10,10, &11; each sample should have 50% males and females (by variable gender). No subject should be replicated in any sample. I used surveyselect proc as below, but same subject sometimes appears in more than one sample and some subjects don't appear at all. which I don't want to happen. How to avoid this situation? Thanks.
proc surveyselect data = DIABLIB.DiabDOESet out = DOEsamp1
method = srs samprate = .33 rep=3 ;
strata gender;
run;
borrowed Robby's data:
data temp;
set strat_people;
n=ranuni(88);
proc sort;
by gender n;
run;
data data1 data2 data3;
set temp;
if mod(_n_,3)=0 then output data1;
else if mod(_n_,3)=1 then output data2;
else output data3;
run;
Linlin
Message was edited by: Linlin
I hope you realize that the dataset which has 11 subjects cannot have 50% males
This isn't too hard to do using data steps and PROC SORT. Assign random numbers to everyone. Sort the males by the random number. Sort the females by the random number. Assign the first 5 males and first 5 females to sample 1. Continue. Done.
Yes, I know 11 is an odd numnber.
Thanks for your solution. It works; but I wanted to know if there is any proc to create such design of experiment.
Thank you.
Hi sasuser3,
please forgive my ignorance - are you saying that you have a working set of code but you're just pinging the group to see if there's a proc that will do this for you?
At first I didn't have solution as I was trying to use proc surveyselect (the code is written in my question). I was looking for a proc (with appropriate options) that can generate a design of experiment, something randomized block with proportions or stratified with proportions...
Later I followed PaigeMiller's solution and made it work; but I am still looking for a proc if it exists for this situation.
I have not yet tried Robby_Beum's solution, but thanks to him.
My working code is:
** generate random numbers for each ID;
data DOESet;
if _n_=1 then do;
**----urand will be your random integer----**;
urand=0;
call ranuni(urand,dummy); **get a starting seed;
put "original seed = " urand; **"save" starting seed to log;
retain urand ;
end;
set DIABLIB.DiabDOESet;
call ranuni(urand,dummy);
drop dummy;
run;
proc sort data = DOESet;
by gender;
run;
Try METHOD=PPS in the procedure statement options, as described in SAS documentation:
This is what comes first when searching by keywords "surveyselect without replacement."
Thanks, but it doesn't fit to my situation. Without replication is within a sample, not among samples. I again tried, but I couldn't get it.
You can also use PROC RANK with a random variable, a touch easier than a dataset I believe.
DATA strat_people ;
INPUT participant gender $1. ;
CARDS ;
1 M
2 F
3 M
4 F
5 M
6 F
7 M
8 F
9 M
10 F
11 M
12 F
13 M
14 F
15 M
16 F
17 M
18 F
19 M
20 F
21 M
22 F
23 M
24 F
25 M
26 F
27 M
28 F
29 M
30 F
31 M
;
data strat_people;
set strat_people;
random=ranuni(88);
run;
proc sort data=strat_people; by gender random; run;
proc rank data=strat_people out=results groups=3;
by gender;
var random;
ranks randomgroup;
run;
Fareeza Khurshed wrote:
You can also use PROC RANK with a random variable, a touch easier than a dataset I believe.
DATA strat_people ;
INPUT participant gender $1. ;
CARDS ;
1 M
2 F
3 M
4 F
5 M
6 F
7 M
8 F
9 M
10 F
11 M
12 F
13 M
14 F
15 M
16 F
17 M
18 F
19 M
20 F
21 M
22 F
23 M
24 F
25 M
26 F
27 M
28 F
29 M
30 F
31 M
;
data strat_people;
set strat_people;
random=ranuni(88);
run;
proc sort data=strat_people; by gender random; run;
proc rank data=strat_people out=results groups=3;
by gender;
var random;
ranks randomgroup;
run;
You don't need two data steps to begin the program, this can be accomplished in a single data step.
Thank you for your volunteering suggestion. I'm growing in my knowledge like your precious suggestions. pls keep in touch...
I wrtote it in EG 4.3. - it's wordy but it works...
DATA strat_people ;
INPUT participant gender $1. ;
CARDS ;
1 M
2 F
3 M
4 F
5 M
6 F
7 M
8 F
9 M
10 F
11 M
12 F
13 M
14 F
15 M
16 F
17 M
18 F
19 M
20 F
21 M
22 F
23 M
24 F
25 M
26 F
27 M
28 F
29 M
30 F
31 M
;
/***************************/
/* Define the sample sizes */
/***************************/
%let k=10;
%let k2=20;
/***********************************************************/
/* There are 31 participants so we need to split out into */
/* 3 datasets of 10, 10 and 11 with a 50% male and 50% */
/* female into each table (except the last since it's odd) */
/***********************************************************/
%macro looptest;
%do %until (&count = 10);
/* GENERATE A RANDOM VECTOR */
data strat_people_1;
SET strat_people;
random=RANUNI(-1);
count=1;
run;
/* SORT OBSERVATIONS BY THE RANDOM VECTOR */
proc sort DATA=strat_people_1;
BY random;
run;
/* SELECT THE FIRST K OBSERVATIONS */
data controla controlb controlc;
SET strat_people_1(drop=random);
rollup_var=1;
IF _N_ le &k then
do;
if gender='M' then male+1;
else female+1;
if male<=5 and female<=5 then output controla;
end;
ELSE IF _N_ gt &k and _N_ le &k2 then
do;
if gender='M' then male+1;
else female+1;
if male<=10 and female<=10 then output controlb;
end;
ELSE output controlc;
run;
proc sql;
create table controla_1 as
select rollup_var,
sum(count) as total_participants
from controla
group by rollup_var;
quit;
data _null_;
set controla_1;
call symput('count',put(total_participants,3.));
run;
%put "&count";
%end;
%mend looptest;
%looptest;
%macro print(value);
proc export data=&value.(keep=participant gender)
outfile="C:\directories\output\Randomize_People.xls"
dbms=excelcs replace;
sheet="&value";
SERVER='rvwsascpt01';
PORT=9621;
run;
%mend print;
%print(controla);
%print(controlb);
%print(controlc);
I could envision a much simpler program that doesn't require any macros at all, just a data step, two PROC SORT steps, and another data step.
borrowed Robby's data:
data temp;
set strat_people;
n=ranuni(88);
proc sort;
by gender n;
run;
data data1 data2 data3;
set temp;
if mod(_n_,3)=0 then output data1;
else if mod(_n_,3)=1 then output data2;
else output data3;
run;
Linlin
Message was edited by: Linlin
Well, Linlin, nice job. Shame on me, I thought it would take two PROC SORTs ... duh ... but you use two datasteps after the SORT, I'm sure you could do it in one datastep
HEY! I provided the data!
;o)
Nice job Linlin!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.