Re: RANDOM SELECTION

deleted_user · Posted 04-17-2009 03:37 AM

Hi,

I am randomly picking up the samples(a) using PROC SURVEYSELECT and now
I want to pick up 10% control group from the samples picked up earlier and dont want to mix up this Control samples(b) with final data.For this do I need to pick up (b) also by proc surveyselect and merge (b) with (a) and take out them from final data or is there any other way? through which I can do in a single or simple steps
and avoid heavy programming.
If anyone can Please help me.
Thank You in advance.

deleted_user · Posted 04-17-2009 01:46 PM

Hello Maithilli,

Could this be a solution to your problem?

proc surveyselect sampsize=200 data=sashelp.cars out=T01_sample;
run;

* CG=control group and RS=Rest of sample;
data T02_sample;
set T01_sample;
if ranuni(0) le 0.1
then set="CG";
else set="RS";
run;

* To check that you have approximately 10% of your sample in Control Group;
proc freq data=T02_sample;
tables set;
run;

Best regards,

Yoba Message was edited by: yoba

deleted_user · Posted 04-17-2009 10:23 PM

Thank You alot Yoba.I'll try this.

deleted_user · Posted 04-18-2009 01:18 AM

Hello Maithili,

There is a more "elegant" way to handle this kind of problem, I think. The trick is to "unsort" the original dataset. Of course, I would'nt recommend to do that if your orginal dataset is very large because sorting is involved. What is nice is that you simply add a flag to your original population indicating where the "record goes".

Here is an example. You'll have exactly 100 observations in the control group, 400 in the rest of sample and the rest would'nt be selected.

Here is the example:

data T01_population;
do i=1 to 10000;
output;
end;
run;

proc sql;
create table T02_population_unsorted as
select *
from T01_population
order by ranuni(0);
quit;

data T03_groups;
set T02_population_unsorted;
select;
when(_N_ le 100) group='CG';
when(_N_ le 500) group='RS';
otherwise group='--';
end;
run;

Other possibility: you use a probability to decide "where the record goes". Even faster. You don't need to sort the dataset.

data T03_groups;
set T01_population;
select;
when(ranuni(0) le 0.1 ) group='CG';
when(0.1 lt ranuni(0) le 0.2) group='RS';
otherwise group='--';
end;
run;

There are many other possibilities that would apply if the population is verrrryyyyy large, for example or if you want to extend it to stratified sampling or ... Let me know if you need further help.

Best regards,

Yoba

deleted_user · Posted 04-20-2009 12:41 AM

Yoba,

Thank you very much.This helped me alot.

Regards,
Maithili

RANDOM SELECTION