Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Permutation test

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 01-28-2016 11:30 AM
(3658 views)

Dear Users,

I have to perform a permutation test without replacement. Let's say, I have 100 patients, 50 with treatment A and 50 with treatment B. There are too many combination to choose 50 in 100. It is a the code I wrote to show you the data I have.

```
data original_data (drop=i j);
do i=1 to 50;
id=i;
trt='1';
output;
end;
do j=1 to 50;
id=j+50;
trt='2';
output;
end;
run;
```

So What I would like to do, is to generate 1000 unique samples with the 100 patients and the same distribution for treatment, in another word that means no sample was drawn more than one.

Do you have any idea to do that ? I am a bit lost, because I can create one data set after another one, and check if the new is different from the other first ones, but I think it is very time consumming ?

Thank you for your help !

Mary

15 REPLIES 15

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@maryslpa wrote:

So What I would like to do, is to generate 1000 unique samples with the 100 patients and the same distribution for treatment, in another word that means no sample was drawn more than one.

What do you mean by no sample was drawn more than once?

You're select 50 people from the 100 population - no replacement (50 unique people).

You want 1000 samples

```
proc surveyselect data=original_data method=srs rep=1000 sampsize=50 out=want;
strata trt/alloc=prop;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**100** patients and the same distribution of trt.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Ok...but your first part says you have 100 patients. That's the entire data set, so I don't understand that part.

Anyways, use proc surveyselect - please read the documentation on the procedure.

1. Method=SRS - specifies the Simple random sample, no replacement

2. Strata statement specifies by TRT and proportional allocation so same distribution as input data set, your sample 50/50

3. sampsize= in proc surveyselect says the number of samples to retrieve.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You're really confusing me here. Is your total population 100 and your sample size 50? You keep changing this.

I still think my Proc SurveySelect is correct, look at the dataset, and let me know what's wrong with it.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I run this and I obtain 50 times the same sample 😞

```
data original_data (drop=i j);
do i=1 to 5;
id=i;
trt='A';
output;
end;
do j=1 to 5;
id=j+5;
trt='B';
output;
end;
run;
proc surveyselect data=original_data method=srs rep=50 sampsize=10 out=want;
strata trt/alloc=prop;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Well yes, your sample there is only 5 and you're asking for 50, so that won't work.

Do you want to randomize the treatment by patients or are you pulling a sample from a dataset.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

PS Look up the Don't be Loopy paper by David Cassell for samples on simulation/bootstrap in SAS.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have already read it, but I didn't find any solution there 😞

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

There is no response variable in the data set, but I think the OP wants to do the following. Assume that each observation has a response variable, Y.

1. Sample 50 observations at random and assign them to group=1.

2. Assign the other 50 observations to Group=2.

3. Compute the difference in between the means (?? not clear) of Y in two groups. Save this number.

4. Repeat 1-3 many times.

5. The distribution of the statistics that you accumulate is the null distribution under the hypothesis that there is no difference between the groups. See where the observed difference lies in the null distribution. If it is near the extremes, then reject the null hypothesis.

For a complete explanation and SAS/IML program, see the article "Resampling and permutation tests in SAS." That article also has a link to a DATA step implementation.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your answer Rick.

In fact, it is true that in my example there is no response variable, because in my case I only wanted to permut treatment.

I have alredy do that

```
data permutations;
set origininal_data;
do permuation=1 to 1000;
ranorder=ranuni(0);
output;
end;
proc sort data=permutations;
by permuation ranorder;
run;
data &out;
set permutations;
by permuation;
if first.permuation then counter=1;
else counter+1;
if counter <=50 then &groupvar='A';
else &groupvar='B';
run;
```

if I split my &out dataset by permuation I will have 1000 datasets but perhaps I have some equal datasets (if we only keep id and trt) =>it is with replacement. I would like to generate 1000 different datasets with different combination of trt and id.

Thanks for your help

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You have a 1000 permutations. Yes there's a chance that one overlaps with another, but it's unlikely and even so, it's a random distribution. If you want to guarantee that you don't have duplicates don't do the sort randomly, shuffle it through and you'll get exactly 1000 different combinations.

Again...this is assuming I understand your question, which I don't seem to be in your case.

Again...this is assuming I understand your question, which I don't seem to be in your case.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

in reality there are 1.0089134e29 combinations, so 1.0089134e29 possible datasets. And I the idea is to select one data only one time, that's called without replacement.

I totally agree with you that the chance to have a datasets drawn 2 times is very very small...

Thank you for your help 🙂

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.