Programming the statistical procedures from SAS

Permutation test

Reply
Occasional Contributor
Posts: 8

Permutation test

Dear Users,

I have to perform a permutation test without replacement. Let's say, I have 100 patients, 50 with treatment A and 50 with treatment B. There are too many combination to choose 50 in 100. It is a the code I wrote to show you the data I have.

data original_data (drop=i j);
    do i=1 to 50;
        id=i;
        trt='1';
        output;
    end;
    do j=1 to 50;
        id=j+50;
        trt='2';
        output;
    end;
run;

 

So What I would like to do, is to generate 1000 unique samples with the 100 patients and the same distribution for treatment, in another word that means no sample was drawn more than one. 

Do you have any idea to do that ? I am a bit lost, because I can create one data set after another one, and check if the new is different from the other first ones, but I think it is very time consumming ? 

 

Thank you for your help !

Mary

 

Grand Advisor
Posts: 16,880

Re: Permutation test


maryslpa wrote:

 

So What I would like to do, is to generate 1000 unique samples with the 100 patients and the same distribution for treatment, in another word that means no sample was drawn more than one. 

 




What do you mean by no sample was drawn more than once? 

You're select 50 people from the 100 population - no replacement (50 unique people).

You want 1000 samples

 

proc surveyselect data=original_data method=srs rep=1000 sampsize=50 out=want;
strata trt/alloc=prop;
run;
 

 

 

 

Occasional Contributor
Posts: 8

Re: Permutation test

I would like to have 1000 different random samples with 100 patients and the same distribution of trt. 

Grand Advisor
Posts: 16,880

Re: Permutation test

[ Edited ]

Ok...but your first part says you have 100 patients. That's the entire data set, so I don't understand that part. 

 

Anyways, use proc surveyselect - please read the documentation on the procedure.

 

1. Method=SRS - specifies the Simple random sample, no replacement

2. Strata statement specifies by TRT and proportional allocation so same distribution as input data set, your sample 50/50

3. sampsize= in proc surveyselect says the number of samples to retrieve.

 

 

Occasional Contributor
Posts: 8

Re: Permutation test

choose 50 in 100 is equal to 50! / 100!50! , and this number is ver very big. At the end what I want is 1000 unique samples with 100 patients and the same distribtuin of treatment. But I am not sure that it is possible with the surveyselect Smiley Sad

Grand Advisor
Posts: 16,880

Re: Permutation test

You're really confusing me here. Is your total population 100 and your sample size 50? You keep changing this. 

 

I still think my Proc SurveySelect is correct, look at the dataset, and let me know what's wrong with it.

Occasional Contributor
Posts: 8

Re: Permutation test

I just would like to permute the treatment for the 100 patients, and I would like to do the permutation 1000 times that means I the end I will have 100 different unique dataset with 100 patients 

Occasional Contributor
Posts: 8

Re: Permutation test

I run this and I obtain 50 times the same sample Smiley Sad 

data original_data (drop=i j);
    do i=1 to 5;
        id=i;
        trt='A';
        output;
    end;
    do j=1 to 5;
        id=j+5;
        trt='B';
        output;
    end;
run;


proc surveyselect data=original_data method=srs rep=50 sampsize=10 out=want;
strata trt/alloc=prop;
run;
Grand Advisor
Posts: 16,880

Re: Permutation test

Well yes, your sample there is only 5 and you're asking for 50, so that won't work.

 

Do you want to randomize the treatment by patients or are you pulling a sample from a dataset.

 

 

Grand Advisor
Posts: 16,880

Re: Permutation test

PS Look up the Don't be Loopy paper by David Cassell for samples on simulation/bootstrap in SAS.

Occasional Contributor
Posts: 8

Re: Permutation test

I have already read it, but I didn't find any solution there Smiley Sad

SAS Super FREQ
Posts: 3,306

Re: Permutation test

There is no response variable in the data set, but I think the OP wants to do the following. Assume that each observation has a response variable, Y.
1. Sample 50 observations at random and assign them to group=1.

2. Assign the other 50 observations to Group=2.

3. Compute the difference in between the means (?? not clear) of Y in two groups. Save this number.

4. Repeat 1-3 many times.

5. The distribution of the statistics that you accumulate is the null distribution under the hypothesis that there is no difference between the groups.  See where the observed difference lies in the null distribution. If it is near the extremes, then reject the null hypothesis.

 

For a complete explanation and SAS/IML program, see the article "Resampling and permutation tests in SAS." That article also has a link to a DATA step implementation.

Occasional Contributor
Posts: 8

Re: Permutation test

Thank you for your answer Rick.
In fact, it is true that in my example there is no response variable, because in my case I only wanted to permut treatment. 
I have alredy do that

data permutations;
    set origininal_data;
    do permuation=1 to 1000;
        ranorder=ranuni(0);
        output;
    end;

proc sort data=permutations;
by permuation ranorder;
run;

data &out;
    set permutations;
    by permuation;
    if first.permuation then counter=1;
    else counter+1;
    if counter <=50 then &groupvar='A';
    else &groupvar='B';
run;

if I split my &out dataset by permuation I will have 1000 datasets but perhaps I have some equal datasets (if we only keep id and trt) =>it is with replacement. I would like to generate 1000 different datasets with different combination of trt and id.

Thanks for your help

Grand Advisor
Posts: 16,880

Re: Permutation test

You have a 1000 permutations. Yes there's a chance that one overlaps with another, but it's unlikely and even so, it's a random distribution. If you want to guarantee that you don't have duplicates don't do the sort randomly, shuffle it through and you'll get exactly 1000 different combinations.

Again...this is assuming I understand your question, which I don't seem to be in your case.

Occasional Contributor
Posts: 8

Re: Permutation test

I think you got it. Smiley Happy
in reality there are 1.0089134e29 combinations, so 1.0089134e29 possible datasets. And I the idea is to select one data only one time, that's called without replacement. 
I totally agree with you that the chance to have a datasets drawn 2 times is very very small... 
Thank you for your help Smiley Happy 

Ask a Question
Discussion stats
  • 15 replies
  • 511 views
  • 0 likes
  • 3 in conversation