how do i make use of iml to work on GA in finding the optimal sample size from the population. can anyone propose some example other than the four examples available in sasuser's guide
In many situtations, you obtain an optimal sample size by choosing a sequence of sample sizes and solving the problem for each size:
do size = 45 to 90 by 5;
/* solve problem with sample size = size */
/* evaluate some "goodness statistic" */
end;
You then choose the sample size that optimizes the criterion of interest. If this is not the case for your application, then I think more information is needed, such as example code and the criterion that you are optimizing.
s.no popul samp c error samprate
.................................
...............................
1118 500 468 4 -0.94999 0.936
1119 500 469 4 -0.94999 0.938
1120 500 470 4 -0.95000 0.940
1121 500 471 4 -0.95000 0.942
1122 500 472 4 -0.95000 0.944
1123 500 473 4 -0.95000 0.946
1124 500 474 4 -0.95000 0.948
1125 500 475 4 -0.95000 0.950
1126 500 476 4 -0.95000 0.952
1127 500 477 4 -0.95000 0.954
...........................
............................
This is part of my data obtained through a formula, i need to find the optimal sample size for the corresponding population with the fitness value being error (need the least error) using genetic algorithm. i need atleast a similar GA program using IML to work on this (by doing all the crossovers, mutations, etc). thank you
So what have you come up with so far? How have you formulated the problem? What is the population? You need to post the code for the objective function, tell us parameters for crossover and mutation, and so forth.
by the system automatically selecting chromosomes from the example set of population above, the chromosome here is the combination of the pop, samp, c
for example 500, 471, 4 can be a chromosome converted into binary ie 5004714 - - - 111110100111010111100
500, 472, 4 can be as the same - - - 111110100111011000
how do i select the suitable observation and set that into evolution, considering the fact that 'error' is the fitness value (needed least error)?
hope u get the explanation, thank you
I don't see how this makes sense. If you encode the population in the high-bits of the binary state vector, then the GA will alter the population size as part of it's optimization. Seems like the population should be fixed, by definition.
I don't think I can help you based on what you've described. You need to provide the sample data, not just the size of the sample. You also need to provide a fitness function that takes the sample and computes the error. Maybe someone else can offer additional suggestions.
I'm sorry, yes you are right, i forgot to notice that the population size is fixed. the error was found out using the hypergeometric distribution, if you could take a run at the program below
data gsas;
do Population=50, 100, 200, 300, 500;
do ss=1 to 1500;output;
end;
end;
run;
proc sort data=gsas;
by Population;
run;
data hyper;
set gsas;
by Population;
if Population=50 then c=0;
if first.Population then c+1;
if _N_=1 then c=0;
retain c;
error=(cdf('HYPER',c,population,population*0.02,ss)-0.95)**2;
samplerate=ss/population;
if ss gt population then delete;
run;
i need an optimal sample size for each fixed population considering the fact that i have the least error for that sample.
Thank you for your time
i'm sorry again, that was a harebrained idea of using the GA in the above concept, ignore the above program.
instead i have got this double sampling plan where for example i need this two sample sizes n1 and n2 with acceptance numbers c1 and c2, the fitness value computed by ASN[=n1+(1-P1)n2] for different combinations of c1 and c2. n1 and n2 can be any.
i just need a 'basic idea' on how GA is used in IML to initialize, select and put into crossover, so that i can work on my concept, as the problems in the sas support are of different concept, thank you once again
If anyone wants to visualize the data, do this:
proc sgplot data=hyper;
series x=samplerate y=error / group=population;
run;
I don't suppose that there is a "textbook problem" that is similar to your and for which you already know the answer? If so, it might be worthwhile to program the GA for that problem to learn about the GA, then modify it to solve the problem that you are actually interested in. I'll step aside and let others offer suggestions. Good luck.
Thank you i'll get back to you with another problem soon
Hello Mr. Rick,
If i have the double sampling plan (the below program) with the parameters n1, n2, c1, c2, r, being considered as the length of chromosome, and the selection probability of the strings can be determined by the least error from the error variable, and using one point crossover and minimizing the error further, is it gonna work for different values of p?
Plz let me know whether i'm clear as i have come across similar kinda work which has been done in c program.
data vivian.doubs;
do c1=0,1,2;
r=c1+2;
do c2=1 to 6;
do n1=1 to 200;
do n2=1 to 200;
do p=0.01, 0.02, 0.03, 0.04, 0.05;output;
PA=probacc2(c1,r,c2,n1,n2,p);
end;end;end;end;end;
run;
data vivian.doubs1;
set vivian.doubs;
if PA>0.99;
if PA>1.00 then delete;
error=PA-0.95;
run;
Thank you.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.