## How to work on GA if i have a population where the optimal sample size is to be calculated?

how do i make use of iml to work on GA in finding the optimal sample size from the population. can anyone propose some example other than the four examples available in sasuser's guide

10 REPLIES 10

## Re: How to work on GA if i have a population where the optimal sample size is to be calculated?

In many situtations, you obtain an optimal sample size by choosing a sequence of sample sizes and solving the problem for each size:

do size = 45 to 90 by 5;

/* solve problem with sample size = size */

/* evaluate some "goodness statistic" */

end;

You then choose the sample size that optimizes the criterion of interest. If this is not the case for your application, then I think more information is needed, such as example code and the criterion that you are optimizing.

## Re: How to work on GA if i have a population where the optimal sample size is to be calculated?

s.no      popul       samp  c    error          samprate

.................................

...............................

1118        500       468    4    -0.94999       0.936

1119        500       469    4    -0.94999       0.938

1120        500       470    4    -0.95000       0.940

1121        500       471    4    -0.95000       0.942

1122        500       472    4    -0.95000       0.944

1123        500       473    4    -0.95000       0.946

1124        500       474    4    -0.95000       0.948

1125        500       475    4    -0.95000       0.950

1126        500       476    4    -0.95000       0.952

1127        500       477    4    -0.95000       0.954

...........................

............................

This is part of my data obtained through a formula, i need to find the optimal sample size for the corresponding population with the fitness value being error (need the least error) using genetic algorithm. i need atleast a similar GA program using IML to work on this (by doing all the crossovers, mutations, etc). thank you

## Re: How to work on GA if i have a population where the optimal sample size is to be calculated?

So what have you come up with so far? How have you formulated the problem? What is the population? You need to post the code for the objective function, tell us parameters for crossover and mutation, and so forth.

## Re: How to work on GA if i have a population where the optimal sample size is to be calculated?

by the system automatically selecting chromosomes from the example set of population above, the chromosome here is the combination of the pop, samp, c

for example 500, 471, 4  can be a chromosome converted into binary ie 5004714 - - -  111110100111010111100

500, 472, 4  can be as the same - - - 111110100111011000

how do i select the suitable observation and set that into evolution, considering the fact that 'error' is the fitness value (needed least error)?

hope u get the explanation, thank you

## Re: How to work on GA if i have a population where the optimal sample size is to be calculated?

I don't see how this makes sense. If you encode the population in the high-bits of the binary state vector, then the GA will alter the population size as part of it's optimization.  Seems like the population should be fixed, by definition.

I don't think I can help you based on what you've described. You need to provide the sample data, not just the size of the sample. You also need to provide a fitness function that takes the sample and computes the error. Maybe someone else can offer additional suggestions.

## Re: How to work on GA if i have a population where the optimal sample size is to be calculated?

I'm sorry, yes you are right, i forgot to notice that the population size is fixed. the error was found out using the hypergeometric distribution, if you could take a run at the program below

data gsas;

do Population=50, 100, 200, 300, 500;

do ss=1 to 1500;output;

end;

end;

run;

proc sort data=gsas;

by Population;

run;

data hyper;

set gsas;

by Population;

if Population=50 then c=0;

if first.Population then c+1;

if _N_=1 then c=0;

retain c;

error=(cdf('HYPER',c,population,population*0.02,ss)-0.95)**2;

samplerate=ss/population;

if ss gt population then delete;

run;

i need an optimal sample size for each fixed population considering the fact that i have the least error for that sample.

Thank you for your time ## Re: How to work on GA if i have a population where the optimal sample size is to be calculated?

i'm sorry again, that was a harebrained idea of using the GA in the above concept, ignore the above program.

instead i have got this double sampling plan where for example i need this two sample sizes n1 and n2 with acceptance numbers c1 and c2, the fitness value computed by ASN[=n1+(1-P1)n2] for different combinations of c1 and c2. n1 and n2 can be any.

i just need a 'basic idea' on how GA is used in IML to initialize, select and put into crossover, so that i can work on my concept, as the problems in the sas support are of different concept, thank you once again

## Re: How to work on GA if i have a population where the optimal sample size is to be calculated?

If anyone wants to visualize the data, do this:

proc sgplot data=hyper;

series x=samplerate y=error / group=population;

run;

I don't suppose that there is a "textbook problem" that is similar to your and for which you already know the answer? If so, it might be worthwhile to program the GA for that problem to learn about the GA, then modify it to solve the problem that you are actually interested in. I'll step aside and let others offer suggestions. Good luck.

## Re: How to work on GA if i have a population where the optimal sample size is to be calculated?

Thank you i'll get back to you with another problem soon

## Re: How to work on GA if i have a population where the optimal sample size is to be calculated?

Hello Mr. Rick,

If i have the double sampling plan (the below program) with the parameters n1, n2, c1, c2, r, being considered as the length of chromosome, and the selection probability of the strings can be determined by the least error from the error variable, and using one point crossover and minimizing the error further, is it gonna work for different values of p?

Plz let me know whether i'm clear as i have come across similar kinda work which has been done in c program.

data vivian.doubs;

do c1=0,1,2;

r=c1+2;

do c2=1 to 6;

do n1=1 to 200;

do n2=1 to 200;

do p=0.01, 0.02, 0.03, 0.04, 0.05;output;

PA=probacc2(c1,r,c2,n1,n2,p);

end;end;end;end;end;

run;

data vivian.doubs1;

set vivian.doubs;

if PA>0.99;

if PA>1.00 then delete;

error=PA-0.95;

run;

Thank you.

From The DO Loop