BookmarkSubscribeRSS Feed
Calcite | Level 5

how do i make use of iml to work on GA in finding the optimal sample size from the population. can anyone propose some example other than the four examples available in sasuser's guide


In many situtations, you obtain an optimal sample size by choosing a sequence of sample sizes and solving the problem for each size:

do size = 45 to 90 by 5;

   /* solve problem with sample size = size */

   /* evaluate some "goodness statistic" */


You then choose the sample size that optimizes the criterion of interest. If this is not the case for your application, then I think more information is needed, such as example code and the criterion that you are optimizing.

Calcite | Level 5

                popul       samp  c    error          samprate



                   1118        500       468    4    -0.94999       0.936

                   1119        500       469    4    -0.94999       0.938

                   1120        500       470    4    -0.95000       0.940

                   1121        500       471    4    -0.95000       0.942

                   1122        500       472    4    -0.95000       0.944

                   1123        500       473    4    -0.95000       0.946

                   1124        500       474    4    -0.95000       0.948

                   1125        500       475    4    -0.95000       0.950

                   1126        500       476    4    -0.95000       0.952

                   1127        500       477    4    -0.95000       0.954



This is part of my data obtained through a formula, i need to find the optimal sample size for the corresponding population with the fitness value being error (need the least error) using genetic algorithm. i need atleast a similar GA program using IML to work on this (by doing all the crossovers, mutations, etc). thank you


So what have you come up with so far? How have you formulated the problem? What is the population? You need to post the code for the objective function, tell us parameters for crossover and mutation, and so forth.

Calcite | Level 5

by the system automatically selecting chromosomes from the example set of population above, the chromosome here is the combination of the pop, samp, c

for example 500, 471, 4  can be a chromosome converted into binary ie 5004714 - - -  111110100111010111100

                  500, 472, 4  can be as the same - - - 111110100111011000

how do i select the suitable observation and set that into evolution, considering the fact that 'error' is the fitness value (needed least error)?

hope u get the explanation, thank you


I don't see how this makes sense. If you encode the population in the high-bits of the binary state vector, then the GA will alter the population size as part of it's optimization.  Seems like the population should be fixed, by definition.

I don't think I can help you based on what you've described. You need to provide the sample data, not just the size of the sample. You also need to provide a fitness function that takes the sample and computes the error. Maybe someone else can offer additional suggestions.

Calcite | Level 5

I'm sorry, yes you are right, i forgot to notice that the population size is fixed. the error was found out using the hypergeometric distribution, if you could take a run at the program below

data gsas;

do Population=50, 100, 200, 300, 500;

do ss=1 to 1500;output;




proc sort data=gsas;

by Population;


data hyper;

set gsas;

by Population;

if Population=50 then c=0;

if first.Population then c+1;

if _N_=1 then c=0;

retain c;



if ss gt population then delete;


i need an optimal sample size for each fixed population considering the fact that i have the least error for that sample.

Thank you for your time Smiley Happy

Calcite | Level 5

i'm sorry again, that was a harebrained idea of using the GA in the above concept, ignore the above program.

instead i have got this double sampling plan where for example i need this two sample sizes n1 and n2 with acceptance numbers c1 and c2, the fitness value computed by ASN[=n1+(1-P1)n2] for different combinations of c1 and c2. n1 and n2 can be any.

i just need a 'basic idea' on how GA is used in IML to initialize, select and put into crossover, so that i can work on my concept, as the problems in the sas support are of different concept, thank you once again


If anyone wants to visualize the data, do this:

proc sgplot data=hyper;

series x=samplerate y=error / group=population;


I don't suppose that there is a "textbook problem" that is similar to your and for which you already know the answer? If so, it might be worthwhile to program the GA for that problem to learn about the GA, then modify it to solve the problem that you are actually interested in. I'll step aside and let others offer suggestions. Good luck.

Calcite | Level 5

Thank you Smiley Happy i'll get back to you with another problem soon

Calcite | Level 5

Hello Mr. Rick,

If i have the double sampling plan (the below program) with the parameters n1, n2, c1, c2, r, being considered as the length of chromosome, and the selection probability of the strings can be determined by the least error from the error variable, and using one point crossover and minimizing the error further, is it gonna work for different values of p?

Plz let me know whether i'm clear as i have come across similar kinda work which has been done in c program.

data vivian.doubs;

do c1=0,1,2;


do c2=1 to 6;

do n1=1 to 200;

do n2=1 to 200;

do p=0.01, 0.02, 0.03, 0.04, 0.05;output;




data vivian.doubs1;

set vivian.doubs;

if PA>0.99;

if PA>1.00 then delete;



Thank you.



Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 10 replies
  • 2 in conversation