Dear all,
I wanted to know how I can select ids ramdomly from a set of dataset using this datasample as an example:
data test;
input pat_id 2. sex $2. age 3. var1 $2. var2 $2. var3 $2. var4 $2. year 5.;
datalines;
1 F 25 A B C D 2001
1 F 25 E F G D 2002
2 F 35 C M N D 2010
2 F 35 E F V W 2020
15 M 55 A B C D 2011
15 M 55 E F G D 2010
15 M 55 U B C D 2011
15 M 55 j F K D 2010
15 M 55 A B C D 2009
15 M 55 E Y G D 2008
15 M 55 F Y T D 2008
11 F 60 A B C D 2001
11 F 60 E F G D 2002
11 F 60 U B C D 2015
11 F 60 j F K D 2004
11 F 60 X B C D 2010
11 F 60 C M N D 2014
11 F 60 F Y T D 2008
11 F 60 S F G D 2003
11 F 60 V B G D 2012
11 F 60 K F K Q 2000
11 F 60 Z B M D 2011
11 F 60 U Y G S 2010
11 F 60 X Y T O 2009
;
run;
and I got this as solution:
proc sort data=test;
by pat_id;
run;
proc surveyselect data=test
method=srs n=5 selectall /* outall */
seed=2718 out=want(drop=SelectionProb SamplingWeight);
strata pat_id;
run;
if I have another question, if I wish to say if the values of var1 and var2 has previously been selected in another id (row) then this shouldn't be selected again for a different Id.
For example: Var1=A and var2=B has already be selected for id 1, then this shouldn' t be selected again for id 15( the row will not be selected)
pat_id
sex
age
var1
var2
var3
var4
year
1
F
25
A
B
C
D
2001
1
F
25
E
F
G
D
2002
2
F
35
C
M
N
D
2010
2
F
35
E
F
V
W
2020
11
F
60
E
F
G
D
2002
11
F
60
E
Y
G
D
2014
11
F
60
F
Y
T
D
2008
11
F
60
S
F
G
D
2003
11
F
60
U
Y
G
S
2010
15
M
55
A
B
C
D
2011
15
M
55
E
F
G
D
2010
15
M
55
U
B
C
D
2011
15
M
55
X
B
C
D
2009
15
M
55
F
Y
T
D
2008
... View more