- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I want to divide a dataset of 31 subjects into 3 samples:10,10, &11; each sample should have 50% males and females (by variable gender). No subject should be replicated in any sample. I used surveyselect proc as below, but same subject sometimes appears in more than one sample and some subjects don't appear at all. which I don't want to happen. How to avoid this situation? Thanks.
proc surveyselect data = DIABLIB.DiabDOESet out = DOEsamp1
method = srs samprate = .33 rep=3 ;
strata gender;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
borrowed Robby's data:
data temp;
set strat_people;
n=ranuni(88);
proc sort;
by gender n;
run;
data data1 data2 data3;
set temp;
if mod(_n_,3)=0 then output data1;
else if mod(_n_,3)=1 then output data2;
else output data3;
run;
Linlin
Message was edited by: Linlin
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I hope you realize that the dataset which has 11 subjects cannot have 50% males
This isn't too hard to do using data steps and PROC SORT. Assign random numbers to everyone. Sort the males by the random number. Sort the females by the random number. Assign the first 5 males and first 5 females to sample 1. Continue. Done.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I know 11 is an odd numnber.
Thanks for your solution. It works; but I wanted to know if there is any proc to create such design of experiment.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi sasuser3,
please forgive my ignorance - are you saying that you have a working set of code but you're just pinging the group to see if there's a proc that will do this for you?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
At first I didn't have solution as I was trying to use proc surveyselect (the code is written in my question). I was looking for a proc (with appropriate options) that can generate a design of experiment, something randomized block with proportions or stratified with proportions...
Later I followed PaigeMiller's solution and made it work; but I am still looking for a proc if it exists for this situation.
I have not yet tried Robby_Beum's solution, but thanks to him.
My working code is:
** generate random numbers for each ID;
data DOESet;
if _n_=1 then do;
**----urand will be your random integer----**;
urand=0;
call ranuni(urand,dummy); **get a starting seed;
put "original seed = " urand; **"save" starting seed to log;
retain urand ;
end;
set DIABLIB.DiabDOESet;
call ranuni(urand,dummy);
drop dummy;
run;
proc sort data = DOESet;
by gender;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try METHOD=PPS in the procedure statement options, as described in SAS documentation:
This is what comes first when searching by keywords "surveyselect without replacement."
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, but it doesn't fit to my situation. Without replication is within a sample, not among samples. I again tried, but I couldn't get it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can also use PROC RANK with a random variable, a touch easier than a dataset I believe.
DATA strat_people ;
INPUT participant gender $1. ;
CARDS ;
1 M
2 F
3 M
4 F
5 M
6 F
7 M
8 F
9 M
10 F
11 M
12 F
13 M
14 F
15 M
16 F
17 M
18 F
19 M
20 F
21 M
22 F
23 M
24 F
25 M
26 F
27 M
28 F
29 M
30 F
31 M
;
data strat_people;
set strat_people;
random=ranuni(88);
run;
proc sort data=strat_people; by gender random; run;
proc rank data=strat_people out=results groups=3;
by gender;
var random;
ranks randomgroup;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Fareeza Khurshed wrote:
You can also use PROC RANK with a random variable, a touch easier than a dataset I believe.
DATA strat_people ;
INPUT participant gender $1. ;
CARDS ;
1 M
2 F
3 M
4 F
5 M
6 F
7 M
8 F
9 M
10 F
11 M
12 F
13 M
14 F
15 M
16 F
17 M
18 F
19 M
20 F
21 M
22 F
23 M
24 F
25 M
26 F
27 M
28 F
29 M
30 F
31 M
;
data strat_people;
set strat_people;
random=ranuni(88);
run;
proc sort data=strat_people; by gender random; run;
proc rank data=strat_people out=results groups=3;
by gender;
var random;
ranks randomgroup;
run;
You don't need two data steps to begin the program, this can be accomplished in a single data step.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your volunteering suggestion. I'm growing in my knowledge like your precious suggestions. pls keep in touch...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I wrtote it in EG 4.3. - it's wordy but it works...
DATA strat_people ;
INPUT participant gender $1. ;
CARDS ;
1 M
2 F
3 M
4 F
5 M
6 F
7 M
8 F
9 M
10 F
11 M
12 F
13 M
14 F
15 M
16 F
17 M
18 F
19 M
20 F
21 M
22 F
23 M
24 F
25 M
26 F
27 M
28 F
29 M
30 F
31 M
;
/***************************/
/* Define the sample sizes */
/***************************/
%let k=10;
%let k2=20;
/***********************************************************/
/* There are 31 participants so we need to split out into */
/* 3 datasets of 10, 10 and 11 with a 50% male and 50% */
/* female into each table (except the last since it's odd) */
/***********************************************************/
%macro looptest;
%do %until (&count = 10);
/* GENERATE A RANDOM VECTOR */
data strat_people_1;
SET strat_people;
random=RANUNI(-1);
count=1;
run;
/* SORT OBSERVATIONS BY THE RANDOM VECTOR */
proc sort DATA=strat_people_1;
BY random;
run;
/* SELECT THE FIRST K OBSERVATIONS */
data controla controlb controlc;
SET strat_people_1(drop=random);
rollup_var=1;
IF _N_ le &k then
do;
if gender='M' then male+1;
else female+1;
if male<=5 and female<=5 then output controla;
end;
ELSE IF _N_ gt &k and _N_ le &k2 then
do;
if gender='M' then male+1;
else female+1;
if male<=10 and female<=10 then output controlb;
end;
ELSE output controlc;
run;
proc sql;
create table controla_1 as
select rollup_var,
sum(count) as total_participants
from controla
group by rollup_var;
quit;
data _null_;
set controla_1;
call symput('count',put(total_participants,3.));
run;
%put "&count";
%end;
%mend looptest;
%looptest;
%macro print(value);
proc export data=&value.(keep=participant gender)
outfile="C:\directories\output\Randomize_People.xls"
dbms=excelcs replace;
sheet="&value";
SERVER='rvwsascpt01';
PORT=9621;
run;
%mend print;
%print(controla);
%print(controlb);
%print(controlc);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I could envision a much simpler program that doesn't require any macros at all, just a data step, two PROC SORT steps, and another data step.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
borrowed Robby's data:
data temp;
set strat_people;
n=ranuni(88);
proc sort;
by gender n;
run;
data data1 data2 data3;
set temp;
if mod(_n_,3)=0 then output data1;
else if mod(_n_,3)=1 then output data2;
else output data3;
run;
Linlin
Message was edited by: Linlin
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, Linlin, nice job. Shame on me, I thought it would take two PROC SORTs ... duh ... but you use two datasteps after the SORT, I'm sure you could do it in one datastep
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
HEY! I provided the data!
;o)
Nice job Linlin!