BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AlexeyS
Pyrite | Level 9

Hello, i would like to ask you about sample selection.

I want to select all different samples from data with no replacement.

i tried proc surveyselect with methods =srs , but i saw that some samples are same as previous, and my aim to select always different samples.

For example : my data is: 1,2,3,4,5

i want to choose all different samples, that size sample is 4, total different samples maybe : 5!/(4!*(5-4)!)=5

the samples will be :1,2,3,4 ; 1,2,3,5 ; 1,2,4,5;  1,3,4,5;   2,3,4,5

thank you

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @AlexeyS,

@AlexeyS wrote:

right, but the problem with allcomb function that is not so suitable for my data.

The good news is: This "problem" can be solved, as shown below.

data have;
do id=1 to 6;
  output;
end;
run;

proc transpose data=have out=trans(drop=_:) prefix=x;
run;

%let k=4; /* sample size */

data want;
set trans;
array x x:;
ncomb=comb(dim(x), &k);
do sample=1 to ncomb;
  rc=allcomb(sample, &k, of x[*]);
  do i=1 to &k;
    id=x[i];
    output;
  end;
end;
keep sample id;
run;

View solution in original post

12 REPLIES 12
Reeza
Super User

Your not drawing a sample here, you're generating all possible combinations. 

 

Take a look at allcomb function and routine. 

AlexeyS
Pyrite | Level 9

right, but the problem with allcomb function that is not so suitable for my data.

my data is look like : my variable is column vector and not row. from this column vector i want create all different samples, one below each other.

my data:

id

1

2

3

4

5

 

FreelanceReinh
Jade | Level 19

Hi @AlexeyS,

@AlexeyS wrote:

right, but the problem with allcomb function that is not so suitable for my data.

The good news is: This "problem" can be solved, as shown below.

data have;
do id=1 to 6;
  output;
end;
run;

proc transpose data=have out=trans(drop=_:) prefix=x;
run;

%let k=4; /* sample size */

data want;
set trans;
array x x:;
ncomb=comb(dim(x), &k);
do sample=1 to ncomb;
  rc=allcomb(sample, &k, of x[*]);
  do i=1 to &k;
    id=x[i];
    output;
  end;
end;
keep sample id;
run;
AlexeyS
Pyrite | Level 9

thank you for your answers.

but i have now other problem, sometimes i have more than 33 variables, and allcomb function cannot work.

as i understood, the decision is call accomb function. but how can i use it?

 

my code with allcomb function :

 

 

data have;
do id=1 to 6;
  output;
end;
run;

proc transpose data=have out=trans(drop=_:) prefix=x;
run;

%let k=4; /* sample size */

data want;
set trans;
array x x:;
ncomb=comb(dim(x), &k);
do sample=1 to ncomb;
  rc=allcomb(sample, &k, of x[*]);
  do i=1 to &k;
    id=x[i];
    output;
  end;
end;
keep sample id;
run;

 

FreelanceReinh
Jade | Level 19

Hi @AlexeyS,

 

You don't need CALL ALLCOMB, but CALL ALLCOMBI.

 

Example:

data have;
do id=1 to 34;
  output;
end;
run;

proc transpose data=have out=trans(drop=_:) prefix=x;
run;

%let k=4; /* sample size */

data want;
set trans;
array x x:;
array i[&k];
i[1]=0;
n=dim(x);
ncomb=comb(n, &k);
do sample=1 to ncomb;
  call allcombi(n, &k, of i[*]);
  do j=1 to &k;
    id=x[i[j]];
    output;
  end;
end;
keep sample id;
run;
Astounding
PROC Star

I assume that you have already exhausted the possibilities of PROC SURVEYSELECT, and it won't do what you need.  In that case, here's an approach the produces one large data set with all the samples in it.  There is a variable SAMPLE that distinguishes the contents of each sample.

 

data want;

do sample=1 to _nobs_;

   do recno=1 to _nobs_;

      if sample ne recno then do;

         set have point=sample nobs=_nobs_;

         output;

      end;

   end;

end;

run;

 

Of course the problem becomes more difficult if you are looking for samples of size 3 instead of samples of size "all but one".  For the "all but two" categories, you would have to add one more loop and check "if sample not in (recno, recno2) then do .. that's the reason for using point=sample rather than point=recno in the code above.

data_null__
Jade | Level 19

 

%let n=5;
%let k=4;
%let ncomb=%sysfunc(comb(&n,&k));
proc plan ordered;
   factors sample=&ncomb id=&k of &n comb;
   output out=C&k.of&n;
   run;
   quit;

Capture.PNG

Ksharp
Super User
Why not using ALLCOMB() ?

data _null_;
array x[5] (1 2 3 4 5);
n=dim(x);
k=4;
ncomb=comb(n,k);
do j=1 to ncomb;
rc=allcomb(j, k, of x[*]);
put j 5. +3 x1-x4 +3 rc=;
end;
run;


data_null__
Jade | Level 19

@Ksharp wrote:
Why not using ALLCOMB() ?

I reckon you didn't read the post from @FreelanceReinh

Ksharp
Super User
OH. John King, That would be easy by using a macro variable or an array to hold those data.

data have;
do id=1 to 6;
  output;
end;
run;
proc sql;
select count(*) into : n from have;
select id into : list separated by ' ' from have;
quit;
data _null_;
array x[&n] (&list);
n=dim(x);
k=4;
ncomb=comb(n,k);
do j=1 to ncomb;
rc=allcomb(j, k, of x[*]);
put j 5. +3 x1-x4 +3 rc=;
end;
run;
data_null__
Jade | Level 19

 

My point is you are just repeating what was already shown earlier in the thread.

@Ksharp wrote:
OH. John King, That would be easy by using a macro variable or an array to hold those data.
 
Ksharp
Super User
John King, Never mind. Just leave one more choice to let OP choose .

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2168 views
  • 5 likes
  • 6 in conversation