Hello, i would like to ask you about sample selection.
I want to select all different samples from data with no replacement.
i tried proc surveyselect with methods =srs , but i saw that some samples are same as previous, and my aim to select always different samples.
For example : my data is: 1,2,3,4,5
i want to choose all different samples, that size sample is 4, total different samples maybe : 5!/(4!*(5-4)!)=5
the samples will be :1,2,3,4 ; 1,2,3,5 ; 1,2,4,5; 1,3,4,5; 2,3,4,5
thank you
Hi @AlexeyS,
@AlexeyS wrote:
right, but the problem with allcomb function that is not so suitable for my data.
The good news is: This "problem" can be solved, as shown below.
data have;
do id=1 to 6;
output;
end;
run;
proc transpose data=have out=trans(drop=_:) prefix=x;
run;
%let k=4; /* sample size */
data want;
set trans;
array x x:;
ncomb=comb(dim(x), &k);
do sample=1 to ncomb;
rc=allcomb(sample, &k, of x[*]);
do i=1 to &k;
id=x[i];
output;
end;
end;
keep sample id;
run;
Your not drawing a sample here, you're generating all possible combinations.
Take a look at allcomb function and routine.
right, but the problem with allcomb function that is not so suitable for my data.
my data is look like : my variable is column vector and not row. from this column vector i want create all different samples, one below each other.
my data:
id
1
2
3
4
5
6
Hi @AlexeyS,
@AlexeyS wrote:
right, but the problem with allcomb function that is not so suitable for my data.
The good news is: This "problem" can be solved, as shown below.
data have;
do id=1 to 6;
output;
end;
run;
proc transpose data=have out=trans(drop=_:) prefix=x;
run;
%let k=4; /* sample size */
data want;
set trans;
array x x:;
ncomb=comb(dim(x), &k);
do sample=1 to ncomb;
rc=allcomb(sample, &k, of x[*]);
do i=1 to &k;
id=x[i];
output;
end;
end;
keep sample id;
run;
thank you for your answers.
but i have now other problem, sometimes i have more than 33 variables, and allcomb function cannot work.
as i understood, the decision is call accomb function. but how can i use it?
my code with allcomb function :
data have;
do id=1 to 6;
output;
end;
run;
proc transpose data=have out=trans(drop=_:) prefix=x;
run;
%let k=4; /* sample size */
data want;
set trans;
array x x:;
ncomb=comb(dim(x), &k);
do sample=1 to ncomb;
rc=allcomb(sample, &k, of x[*]);
do i=1 to &k;
id=x[i];
output;
end;
end;
keep sample id;
run;
Hi @AlexeyS,
You don't need CALL ALLCOMB, but CALL ALLCOMBI.
Example:
data have;
do id=1 to 34;
output;
end;
run;
proc transpose data=have out=trans(drop=_:) prefix=x;
run;
%let k=4; /* sample size */
data want;
set trans;
array x x:;
array i[&k];
i[1]=0;
n=dim(x);
ncomb=comb(n, &k);
do sample=1 to ncomb;
call allcombi(n, &k, of i[*]);
do j=1 to &k;
id=x[i[j]];
output;
end;
end;
keep sample id;
run;
I assume that you have already exhausted the possibilities of PROC SURVEYSELECT, and it won't do what you need. In that case, here's an approach the produces one large data set with all the samples in it. There is a variable SAMPLE that distinguishes the contents of each sample.
data want;
do sample=1 to _nobs_;
do recno=1 to _nobs_;
if sample ne recno then do;
set have point=sample nobs=_nobs_;
output;
end;
end;
end;
run;
Of course the problem becomes more difficult if you are looking for samples of size 3 instead of samples of size "all but one". For the "all but two" categories, you would have to add one more loop and check "if sample not in (recno, recno2) then do .. that's the reason for using point=sample rather than point=recno in the code above.
%let n=5;
%let k=4;
%let ncomb=%sysfunc(comb(&n,&k));
proc plan ordered;
factors sample=&ncomb id=&k of &n comb;
output out=C&k.of&n;
run;
quit;
data _null_; array x[5] (1 2 3 4 5); n=dim(x); k=4; ncomb=comb(n,k); do j=1 to ncomb; rc=allcomb(j, k, of x[*]); put j 5. +3 x1-x4 +3 rc=; end; run;
@Ksharp wrote:
Why not using ALLCOMB() ?
I reckon you didn't read the post from @FreelanceReinh
data have; do id=1 to 6; output; end; run; proc sql; select count(*) into : n from have; select id into : list separated by ' ' from have; quit; data _null_; array x[&n] (&list); n=dim(x); k=4; ncomb=comb(n,k); do j=1 to ncomb; rc=allcomb(j, k, of x[*]); put j 5. +3 x1-x4 +3 rc=; end; run;
My point is you are just repeating what was already shown earlier in the thread.
@Ksharp wrote:
OH. John King, That would be easy by using a macro variable or an array to hold those data.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.