Permutation test

maryslpa · Posted 01-28-2016 11:30 AM

Dear Users,

I have to perform a permutation test without replacement. Let's say, I have 100 patients, 50 with treatment A and 50 with treatment B. There are too many combination to choose 50 in 100. It is a the code I wrote to show you the data I have.

data original_data (drop=i j);
    do i=1 to 50;
        id=i;
        trt='1';
        output;
    end;
    do j=1 to 50;
        id=j+50;
        trt='2';
        output;
    end;
run;

So What I would like to do, is to generate 1000 unique samples with the 100 patients and the same distribution for treatment, in another word that means no sample was drawn more than one.

Do you have any idea to do that ? I am a bit lost, because I can create one data set after another one, and check if the new is different from the other first ones, but I think it is very time consumming ?

Thank you for your help !

Mary

Reeza · Posted 01-28-2016 11:47 AM

@maryslpa wrote:

So What I would like to do, is to generate 1000 unique samples with the 100 patients and the same distribution for treatment, in another word that means no sample was drawn more than one.

What do you mean by no sample was drawn more than once?

You're select 50 people from the 100 population - no replacement (50 unique people).

You want 1000 samples

proc surveyselect data=original_data method=srs rep=1000 sampsize=50 out=want;
strata trt/alloc=prop;
run;

maryslpa · Posted 01-28-2016 11:58 AM

I would like to have 1000 different random samples with 100 patients and the same distribution of trt.

Reeza · Posted 01-28-2016 12:01 PM

Ok...but your first part says you have 100 patients. That's the entire data set, so I don't understand that part.

Anyways, use proc surveyselect - please read the documentation on the procedure.

1. Method=SRS - specifies the Simple random sample, no replacement

2. Strata statement specifies by TRT and proportional allocation so same distribution as input data set, your sample 50/50

3. sampsize= in proc surveyselect says the number of samples to retrieve.

maryslpa · Posted 01-28-2016 12:13 PM

choose 50 in 100 is equal to 50! / 100!50! , and this number is ver very big. At the end what I want is 1000 unique samples with 100 patients and the same distribtuin of treatment. But I am not sure that it is possible with the surveyselect 😞

Reeza · Posted 01-28-2016 12:18 PM

You're really confusing me here. Is your total population 100 and your sample size 50? You keep changing this.

I still think my Proc SurveySelect is correct, look at the dataset, and let me know what's wrong with it.

maryslpa · Posted 01-28-2016 12:24 PM

I just would like to permute the treatment for the 100 patients, and I would like to do the permutation 1000 times that means I the end I will have 100 different unique dataset with 100 patients

maryslpa · Posted 01-28-2016 12:21 PM

I run this and I obtain 50 times the same sample 😞

data original_data (drop=i j);
    do i=1 to 5;
        id=i;
        trt='A';
        output;
    end;
    do j=1 to 5;
        id=j+5;
        trt='B';
        output;
    end;
run;


proc surveyselect data=original_data method=srs rep=50 sampsize=10 out=want;
strata trt/alloc=prop;
run;

Reeza · Posted 01-28-2016 12:30 PM

Well yes, your sample there is only 5 and you're asking for 50, so that won't work.

Do you want to randomize the treatment by patients or are you pulling a sample from a dataset.

Reeza · Posted 01-28-2016 11:53 AM

PS Look up the Don't be Loopy paper by David Cassell for samples on simulation/bootstrap in SAS.

maryslpa · Posted 01-28-2016 12:00 PM

I have already read it, but I didn't find any solution there 😞

Rick_SAS · Posted 01-28-2016 01:09 PM

There is no response variable in the data set, but I think the OP wants to do the following. Assume that each observation has a response variable, Y.
1. Sample 50 observations at random and assign them to group=1.

2. Assign the other 50 observations to Group=2.

3. Compute the difference in between the means (?? not clear) of Y in two groups. Save this number.

4. Repeat 1-3 many times.

5. The distribution of the statistics that you accumulate is the null distribution under the hypothesis that there is no difference between the groups. See where the observed difference lies in the null distribution. If it is near the extremes, then reject the null hypothesis.

For a complete explanation and SAS/IML program, see the article "Resampling and permutation tests in SAS." That article also has a link to a DATA step implementation.

maryslpa · Posted 01-28-2016 01:49 PM

Thank you for your answer Rick.
In fact, it is true that in my example there is no response variable, because in my case I only wanted to permut treatment.
I have alredy do that

data permutations;
    set origininal_data;
    do permuation=1 to 1000;
        ranorder=ranuni(0);
        output;
    end;

proc sort data=permutations;
by permuation ranorder;
run;

data &out;
    set permutations;
    by permuation;
    if first.permuation then counter=1;
    else counter+1;
    if counter <=50 then &groupvar='A';
    else &groupvar='B';
run;

if I split my &out dataset by permuation I will have 1000 datasets but perhaps I have some equal datasets (if we only keep id and trt) =>it is with replacement. I would like to generate 1000 different datasets with different combination of trt and id.

Thanks for your help

Reeza · Posted 01-28-2016 02:56 PM

You have a 1000 permutations. Yes there's a chance that one overlaps with another, but it's unlikely and even so, it's a random distribution. If you want to guarantee that you don't have duplicates don't do the sort randomly, shuffle it through and you'll get exactly 1000 different combinations.

Again...this is assuming I understand your question, which I don't seem to be in your case.

maryslpa · Posted 01-28-2016 03:32 PM

I think you got it. 🙂
in reality there are 1.0089134e29 combinations, so 1.0089134e29 possible datasets. And I the idea is to select one data only one time, that's called without replacement.
I totally agree with you that the chance to have a datasets drawn 2 times is very very small...
Thank you for your help 🙂

Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Re: Permutation test

Catch up on SAS Innovate 2026