Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-13-2013 01:48 AM
(1292 views)

how do i make use of iml to work on GA in finding the optimal sample size from the population. can anyone propose some example other than the four examples available in sasuser's guide

10 REPLIES 10

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In many situtations, you obtain an optimal sample size by choosing a sequence of sample sizes and solving the problem for each size:

do size = 45 to 90 by 5;

/* solve problem with sample size = size */

/* evaluate some "goodness statistic" */

end;

You then choose the sample size that optimizes the criterion of interest. If this is not the case for your application, then I think more information is needed, such as example code and the criterion that you are optimizing.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

s.no popul samp c error samprate

.................................

...............................

1118 500 468 4 -0.94999 0.936

1119 500 469 4 -0.94999 0.938

1120 500 470 4 -0.95000 0.940

1121 500 471 4 -0.95000 0.942

1122 500 472 4 -0.95000 0.944

1123 500 473 4 -0.95000 0.946

1124 500 474 4 -0.95000 0.948

1125 500 475 4 -0.95000 0.950

1126 500 476 4 -0.95000 0.952

1127 500 477 4 -0.95000 0.954

...........................

............................

This is part of my data obtained through a formula, i need to find the optimal sample size for the corresponding population with the fitness value being error (need the least error) using genetic algorithm. i need atleast a similar GA program using IML to work on this (by doing all the crossovers, mutations, etc). thank you

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

by the system automatically selecting chromosomes from the example set of population above, the chromosome here is the combination of the pop, samp, c

for example 500, 471, 4 can be a chromosome converted into binary ie 5004714 - - - 111110100111010111100

500, 472, 4 can be as the same - - - 111110100111011000

how do i select the suitable observation and set that into evolution, considering the fact that 'error' is the fitness value (needed least error)?

hope u get the explanation, thank you

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't see how this makes sense. If you encode the population in the high-bits of the binary state vector, then the GA will alter the population size as part of it's optimization. Seems like the population should be fixed, by definition.

I don't think I can help you based on what you've described. You need to provide the sample data, not just the size of the sample. You also need to provide a fitness function that takes the sample and computes the error. Maybe someone else can offer additional suggestions.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm sorry, yes you are right, i forgot to notice that the population size is fixed. the error was found out using the hypergeometric distribution, if you could take a run at the program below

data gsas;

do Population=50, 100, 200, 300, 500;

do ss=1 to 1500;output;

end;

end;

run;

proc sort data=gsas;

by Population;

run;

data hyper;

set gsas;

by Population;

if Population=50 then c=0;

if first.Population then c+1;

if _N_=1 then c=0;

retain c;

error=(cdf('HYPER',c,population,population*0.02,ss)-0.95)**2;

samplerate=ss/population;

if ss gt population then delete;

run;

i need an optimal sample size for each fixed population considering the fact that i have the least error for that sample.

Thank you for your time

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

i'm sorry again, that was a harebrained idea of using the GA in the above concept, ignore the above program.

instead i have got this double sampling plan where for example i need this two sample sizes n1 and n2 with acceptance numbers c1 and c2, the fitness value computed by ASN[=n1+(1-P1)n2] for different combinations of c1 and c2. n1 and n2 can be any.

i just need a 'basic idea' on how GA is used in IML to initialize, select and put into crossover, so that i can work on my concept, as the problems in the sas support are of different concept, thank you once again

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If anyone wants to visualize the data, do this:

proc sgplot data=hyper;

series x=samplerate y=error / group=population;

run;

I don't suppose that there is a "textbook problem" that is similar to your and for which you already know the answer? If so, it might be worthwhile to program the GA for that problem to learn about the GA, then modify it to solve the problem that you are actually interested in. I'll step aside and let others offer suggestions. Good luck.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you i'll get back to you with another problem soon

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello Mr. Rick,

If i have the double sampling plan (the below program) with the parameters n1, n2, c1, c2, r, being considered as the length of chromosome, and the selection probability of the strings can be determined by the least error from the error variable, and using one point crossover and minimizing the error further, is it gonna work for different values of p?

Plz let me know whether i'm clear as i have come across similar kinda work which has been done in c program.

data vivian.doubs;

do c1=0,1,2;

r=c1+2;

do c2=1 to 6;

do n1=1 to 200;

do n2=1 to 200;

do p=0.01, 0.02, 0.03, 0.04, 0.05;output;

PA=probacc2(c1,r,c2,n1,n2,p);

end;end;end;end;end;

run;

data vivian.doubs1;

set vivian.doubs;

if PA>0.99;

if PA>1.00 then delete;

error=PA-0.95;

run;

Thank you.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.