BookmarkSubscribeRSS Feed
maryslpa
Fluorite | Level 6

Dear Users,

I have to perform a permutation test without replacement. Let's say, I have 100 patients, 50 with treatment A and 50 with treatment B. There are too many combination to choose 50 in 100. It is a the code I wrote to show you the data I have.

data original_data (drop=i j);
    do i=1 to 50;
        id=i;
        trt='1';
        output;
    end;
    do j=1 to 50;
        id=j+50;
        trt='2';
        output;
    end;
run;

 

So What I would like to do, is to generate 1000 unique samples with the 100 patients and the same distribution for treatment, in another word that means no sample was drawn more than one. 

Do you have any idea to do that ? I am a bit lost, because I can create one data set after another one, and check if the new is different from the other first ones, but I think it is very time consumming ? 

 

Thank you for your help !

Mary

 

15 REPLIES 15
Reeza
Super User

@maryslpa wrote:

 

So What I would like to do, is to generate 1000 unique samples with the 100 patients and the same distribution for treatment, in another word that means no sample was drawn more than one. 

 




What do you mean by no sample was drawn more than once? 

You're select 50 people from the 100 population - no replacement (50 unique people).

You want 1000 samples

 

proc surveyselect data=original_data method=srs rep=1000 sampsize=50 out=want;
strata trt/alloc=prop;
run;
 

 

 

 

maryslpa
Fluorite | Level 6

I would like to have 1000 different random samples with 100 patients and the same distribution of trt. 

Reeza
Super User

Ok...but your first part says you have 100 patients. That's the entire data set, so I don't understand that part. 

 

Anyways, use proc surveyselect - please read the documentation on the procedure.

 

1. Method=SRS - specifies the Simple random sample, no replacement

2. Strata statement specifies by TRT and proportional allocation so same distribution as input data set, your sample 50/50

3. sampsize= in proc surveyselect says the number of samples to retrieve.

 

 

maryslpa
Fluorite | Level 6

choose 50 in 100 is equal to 50! / 100!50! , and this number is ver very big. At the end what I want is 1000 unique samples with 100 patients and the same distribtuin of treatment. But I am not sure that it is possible with the surveyselect 😞

Reeza
Super User

You're really confusing me here. Is your total population 100 and your sample size 50? You keep changing this. 

 

I still think my Proc SurveySelect is correct, look at the dataset, and let me know what's wrong with it.

maryslpa
Fluorite | Level 6

I just would like to permute the treatment for the 100 patients, and I would like to do the permutation 1000 times that means I the end I will have 100 different unique dataset with 100 patients 

maryslpa
Fluorite | Level 6

I run this and I obtain 50 times the same sample 😞 

data original_data (drop=i j);
    do i=1 to 5;
        id=i;
        trt='A';
        output;
    end;
    do j=1 to 5;
        id=j+5;
        trt='B';
        output;
    end;
run;


proc surveyselect data=original_data method=srs rep=50 sampsize=10 out=want;
strata trt/alloc=prop;
run;
Reeza
Super User

Well yes, your sample there is only 5 and you're asking for 50, so that won't work.

 

Do you want to randomize the treatment by patients or are you pulling a sample from a dataset.

 

 

Reeza
Super User

PS Look up the Don't be Loopy paper by David Cassell for samples on simulation/bootstrap in SAS.

maryslpa
Fluorite | Level 6

I have already read it, but I didn't find any solution there 😞

Rick_SAS
SAS Super FREQ

There is no response variable in the data set, but I think the OP wants to do the following. Assume that each observation has a response variable, Y.
1. Sample 50 observations at random and assign them to group=1.

2. Assign the other 50 observations to Group=2.

3. Compute the difference in between the means (?? not clear) of Y in two groups. Save this number.

4. Repeat 1-3 many times.

5. The distribution of the statistics that you accumulate is the null distribution under the hypothesis that there is no difference between the groups.  See where the observed difference lies in the null distribution. If it is near the extremes, then reject the null hypothesis.

 

For a complete explanation and SAS/IML program, see the article "Resampling and permutation tests in SAS." That article also has a link to a DATA step implementation.

maryslpa
Fluorite | Level 6

Thank you for your answer Rick.
In fact, it is true that in my example there is no response variable, because in my case I only wanted to permut treatment. 
I have alredy do that

data permutations;
    set origininal_data;
    do permuation=1 to 1000;
        ranorder=ranuni(0);
        output;
    end;

proc sort data=permutations;
by permuation ranorder;
run;

data &out;
    set permutations;
    by permuation;
    if first.permuation then counter=1;
    else counter+1;
    if counter <=50 then &groupvar='A';
    else &groupvar='B';
run;

if I split my &out dataset by permuation I will have 1000 datasets but perhaps I have some equal datasets (if we only keep id and trt) =>it is with replacement. I would like to generate 1000 different datasets with different combination of trt and id.

Thanks for your help

Reeza
Super User
You have a 1000 permutations. Yes there's a chance that one overlaps with another, but it's unlikely and even so, it's a random distribution. If you want to guarantee that you don't have duplicates don't do the sort randomly, shuffle it through and you'll get exactly 1000 different combinations.

Again...this is assuming I understand your question, which I don't seem to be in your case.

maryslpa
Fluorite | Level 6

I think you got it. 🙂
in reality there are 1.0089134e29 combinations, so 1.0089134e29 possible datasets. And I the idea is to select one data only one time, that's called without replacement. 
I totally agree with you that the chance to have a datasets drawn 2 times is very very small... 
Thank you for your help 🙂 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 3503 views
  • 0 likes
  • 3 in conversation