Hello everyone, my question is similar to this one below but with one difference: the constraint affects the random sample https://communities.sas.com/t5/SAS-Procedures/PROC-SURVEYSELECT-with-Constraints/td-p/51235
Example
Sel_1
3
2
4
1
2
So for Sel_2, I would like to randomly select 1-4 with 2 chosen twice such that Sel_1 NE Sel_2
Sel_1 Sel_2 (acceptable since none match)
3 2
2 4
4 2
1 3
2 1
Is there a way to conduct this? Using the where statement doesn't make sense as the where statement requires an actual variable in the dataset and it can limit the size of the resulting sampling. If not, I was considering some sort of recursive method by changing the seed number until that condition is met. Anything helps and thank you so much for reading.
EDIT: The main result desired is that the new variable (Sel_2) follows two constraints
1. Sel_2 does not match Sel_1 in the same observation
2. The number of the groups (in the previous example, 1-4) is controlled through a column. (This should also be followed by Sel_1)
i.e. 1 shows up once, 2 shows up twice, 3-4 show up once for Sel_1 and Sel_2 while following the first constraint
EDIT: Ignore as the post was edited.
Resending the example since the formatting was not followed:
Example
Selection_1
3
2
4
1
2
So for Selection_2, I would like to randomly select 1-4 with 2 chosen twice such that Selection_1 NE Selection_2
Selection_1 Selection_2 (acceptable since none match)
3 2
2 4
4 2
1 3
2 1
That looks more like something for Proc Plan.
Partial from the documentation:
PLAN procedure constructs designs and randomizes plans for factorial experiments, especially nested and crossed experiments and randomized block designs. PROC PLAN can also be used for generating lists of permutations and combinations of numbers. The PLAN procedure can construct the following types of experimental designs:
full factorial designs, with and without randomization
You might describe how you intend to use this result.
Thanks for the help! I intend to use it as follows: have an ordered id and then have three following measurements that are randomized. The following code that I wrote down does something similar but would like to control the number of "drugs" and how many times each one can show up per level.
I generated this code and got this result
data dat;
do id=101 to 108;
output;
end;
run;
proc plan seed=27371;
factors id=8 ordered Drug=3;
output data=dat out=plan;
run;
SAS Output
1 | 2 | 1 | 3 |
2 | 1 | 2 | 3 |
3 | 3 | 1 | 2 |
4 | 3 | 1 | 2 |
5 | 1 | 3 | 2 |
6 | 2 | 3 | 1 |
7 | 1 | 3 | 2 |
8 | 3 | 2 | 1 |
Which is close to what I'd like but I would like to know how to control for the number of times each drug shows up per level and how to increase the number of drugs without increasing the levels.
e.g Have 6 drugs, 1,2,3,4 show up 1 time but 5,6 show up twice per level
PS I can't seem to open out=plan
Proc Plan since it has an interactive behavior possible is like Proc SQL and Datasets and uses quit; to indicate you are actually finished with the procedure.
You'll have to clarify "Which is close to what I'd like but I would like to know how to control for the number of times each drug shows up per level and how to increase the number of drugs without increasing the levels." Levels of what?
Is this closer?
proc plan seed=27371 ; factors id=8 ordered Drug=3 of 5; run; quit;
Notice that you were getting an error from your data set because there was no value for drug.
That's definitely closer, thanks!
@ballardw wrote:
You'll have to clarify "Which is close to what I'd like but I would like to know how to control for the number of times each drug shows up per level and how to increase the number of drugs without increasing the levels." Levels of what?
I apologize for my poor wording. Let's return to the output that I had.
id Drug
1 | 2 | 1 | 3 |
2 | 1 | 2 | 3 |
3 | 3 | 1 | 2 |
4 | 3 | 1 | 2 |
5 | 1 | 3 | 2 |
6 | 2 | 3 | 1 |
7 | 1 | 3 | 2 |
8 | 3 | 2 | 1 |
In the second column, 1 shows up 3 times, 2 shows up 2 times, and 3 shows up 3 times
This sum does not match the fourth column. (Coincidentally, it does for the third)
I would like to define saying 1 has to show up 3 times across the second, third, and fourth column without being in the same observation (ditto for 2 and 3 barring size).
Using PROC SURVEYSELECT as an example, I would like to control the size of the groups such as when using the option groups=(num1, num2, ..., num n) for PROC SELECTSURVEY. Essentially, these two PROCs each do something that I'd like to accomplish in just one operation (PROC PLAN for random ordering without a number repeating across an observation with multiple column creation) (PROC SURVEYSELECT for controlling size of each group)
I hope this explanation is much clearer.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.