Hello all, I appreciate your help on writing an iteration process. I need to split a sample into three groups based on the following procedure: 1. Randomly select 3 seed observations from main dataset, and assign them to three sub datasets, "high", "middle", and "low" respectively based on the values of variable A from the seed observations. So each sub dataset now has one observation to start with. 2. Starting from the main dataset with the 3 seed observations excluded, get the difference between the value of variable A from each observation and the median value of variable A in the sub datasets. An observation will be added to one of the sub datasets when the sub dataset has the smallest value on the squared difference compared with other sub datasets. 3. Repeat step 2 until all the observations in the main dataset have been examined. I have figured out the first step and have the following sample data to start with: data have;
input a;
cards;
-1.35
-1.10
-1.02
-0.72
-0.18
-0.11
0.31
0.58
0.67
;
run;
*randomly generate 3 seed observations*;
proc surveyselect data=have out=rand method=srs sampsize=3 seed=100 noprint; run;
data rand; set rand; n+1; run;
data t1; set rand; if n=1; run; data t1; set t1; drop n; run;*low sub dataset*;
data t2; set rand; if n=2; run; data t2; set t2; drop n; run;*middle sub dataset*;
data t3; set rand; if n=3; run; data t3; set t3; drop n; run;*high sub dataset*;
*exclude the 3 seed observations from main dataset*;
proc sql; create table data as select
a.*,b.x
from w1 a left join rand b
on a.x=b.x
where b.x is null;
quit; After running the code above, I have 3 sub datasets "t1" "t2" and "t3", and a main dataset "data". How can I code steps 2 and 3 with these datasets? I am open to coding step 1 in a more efficient manner as well. Many thank!
... View more