Re: How to use PROC SURVEYSELECT with Sampling Constraint

Learning_S · Posted 01-12-2018 03:07 AM

Hello everyone, my question is similar to this one below but with one difference: the constraint affects the random sample https://communities.sas.com/t5/SAS-Procedures/PROC-SURVEYSELECT-with-Constraints/td-p/51235

Example

Sel_1

3

2

4

1

2

So for Sel_2, I would like to randomly select 1-4 with 2 chosen twice such that Sel_1 NE Sel_2

Sel_1 Sel_2 (acceptable since none match)

3 2

2 4

4 2

1 3

2 1

Is there a way to conduct this? Using the where statement doesn't make sense as the where statement requires an actual variable in the dataset and it can limit the size of the resulting sampling. If not, I was considering some sort of recursive method by changing the seed number until that condition is met. Anything helps and thank you so much for reading.

EDIT: The main result desired is that the new variable (Sel_2) follows two constraints

1. Sel_2 does not match Sel_1 in the same observation

2. The number of the groups (in the previous example, 1-4) is controlled through a column. (This should also be followed by Sel_1)

i.e. 1 shows up once, 2 shows up twice, 3-4 show up once for Sel_1 and Sel_2 while following the first constraint

Learning_S · Posted 01-12-2018 03:11 AM

EDIT: Ignore as the post was edited.

~~Resending the example since the formatting was not followed:~~

~~Example~~

~~Selection_1~~

3

2

4

1

2

~~So for Selection_2, I would like to randomly select 1-4 with 2 chosen twice such that Selection_1 NE Selection_2~~

~~Selection_1 Selection_2 (acceptable since none match)~~

~~3 2~~

~~2 4~~

~~4 2~~

~~1 3~~

~~2 1~~

ballardw · Posted 01-12-2018 11:22 AM

That looks more like something for Proc Plan.

Partial from the documentation:

PLAN procedure constructs designs and randomizes plans for factorial experiments, especially nested and crossed experiments and randomized block designs. PROC PLAN can also be used for generating lists of permutations and combinations of numbers. The PLAN procedure can construct the following types of experimental designs:

full factorial designs, with and without randomization

You might describe how you intend to use this result.

Learning_S · Posted 01-12-2018 12:07 PM

Thanks for the help! I intend to use it as follows: have an ordered id and then have three following measurements that are randomized. The following code that I wrote down does something similar but would like to control the number of "drugs" and how many times each one can show up per level.

I generated this code and got this result

data dat;
   do id=101 to 108;
   output;
   end;
run;

proc plan seed=27371;
   factors id=8 ordered Drug=3;
   output data=dat out=plan;
run;

SAS Output

id Drug

1	2	1	3
2	1	2	3
3	3	1	2
4	3	1	2
5	1	3	2
6	2	3	1
7	1	3	2
8	3	2	1

Which is close to what I'd like but I would like to know how to control for the number of times each drug shows up per level and how to increase the number of drugs without increasing the levels.

e.g Have 6 drugs, 1,2,3,4 show up 1 time but 5,6 show up twice per level

PS I can't seem to open out=plan

ballardw · Posted 01-12-2018 06:09 PM

Proc Plan since it has an interactive behavior possible is like Proc SQL and Datasets and uses quit; to indicate you are actually finished with the procedure.

You'll have to clarify "Which is close to what I'd like but I would like to know how to control for the number of times each drug shows up per level and how to increase the number of drugs without increasing the levels." Levels of what?

Is this closer?

proc plan  seed=27371 ;
   factors id=8 ordered  Drug=3 of 5;
   
run;
quit;

Notice that you were getting an error from your data set because there was no value for drug.

Learning_S · Posted 01-12-2018 06:31 PM

That's definitely closer, thanks!

@ballardw wrote:

You'll have to clarify "Which is close to what I'd like but I would like to know how to control for the number of times each drug shows up per level and how to increase the number of drugs without increasing the levels." Levels of what?

I apologize for my poor wording. Let's return to the output that I had.

id Drug

1	2	1	3
2	1	2	3
3	3	1	2
4	3	1	2
5	1	3	2
6	2	3	1
7	1	3	2
8	3	2	1

In the second column, 1 shows up 3 times, 2 shows up 2 times, and 3 shows up 3 times

This sum does not match the fourth column. (Coincidentally, it does for the third)

I would like to define saying 1 has to show up 3 times across the second, third, and fourth column without being in the same observation (ditto for 2 and 3 barring size).

Using PROC SURVEYSELECT as an example, I would like to control the size of the groups such as when using the option groups=(num1, num2, ..., num n) for PROC SELECTSURVEY. Essentially, these two PROCs each do something that I'd like to accomplish in just one operation (PROC PLAN for random ordering without a number repeating across an observation with multiple column creation) (PROC SURVEYSELECT for controlling size of each group)

I hope this explanation is much clearer.

How to use PROC SURVEYSELECT with Sampling Constraint