Good Day!
I would like to ask if I use GA iml, could I use other procedure in GASEOBJ module?
For example, Example 21.3: Integer Programming Knapsack Problem in iml book,
call gaseobj(jd, 1, "knapsack");
the knapsack module is in iml environment, could I use other procedure in this module in objective module?
Thank you very much!
"knapsack" is a function build by IML statement. You can't use proc mean or proc ttest in it .
Anyway welcome to GA world ,it is easy to learn and very powerful .
Sure. You can use the SUBMIT/ENDSUBMIT statements to call SAS procedures. You would need to communicate the data via SAS data sets. The objective function will execute fastest if you use SAS/IML functions (MEAN, TABULATE, etc), but if necessary you can call SAS procedures.
Good day!
How to use submit and endsubmit statements to return the result value to GA module?
For example, in my sas programme, first, I import a data. Second, I use iml to rearrange the data and create the sas data. Third, I jump to procdure to analyze the data. Finally, I use iml to rearrange the data calculating from a procdure.
I would like to return the final data value to GA.
I have searched iml use'r guide. But it shows only iml to proc, it didn't tell us how to return proc value.
Thank you.
If the result that you want is in a SAS data set (written by a procedure or by using ODS OUTPUT), you can use the USE and READstatements to read the results into IML vectors (or matrices). For an overview, see the links at "How to read data set variables into SAS/IML vectors."
Thanks a lot!
I have finished GA module programming.
It is ok for iteration=100, but I need to set up ita=500.
There is a problem shows in log,
“The SAS System stopped processing this step because of insufficient memory.”
I have released memory of matrices by “free”, and close the output results in PROC PLS by “ods select none;.”
Moreover, I have tried to use “proc iml symsize=100000000000000 worksize=100000000000000;.” It didn’t work.
By the way, there are 1000 datasets, and the dimension of a dataset is 140*351.
Those data sets are not large. All together they are only 0.04GB. On the other hand, we don't know what computations you are doing. Can you share your program?
If you are running out of SAS, use the -MEMSIZE option when you launch SAS to obtain more RAM for the SAS process.
The SAS code is as following,
dm log 'clear' ;
dm output 'clear';
PROC IMPORT OUT= WORK.cal10_1
DATAFILE= "D:\nir\cal_dat(1).csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
run;
PROC IMPORT OUT= WORK.cal10_2
DATAFILE= "D:\nir\cal_dat(2).csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
run;
proc iml;
/*Section 2*/
/* "start" finish module, cross_uniform*//*============================================*/
start uniform_cross(child1, child2, parent1, parent2) global(i);
child1 = parent1;
child2 = parent2;
do ii = 1 to ncol(parent1);
r = uniform(1234);
if r<=0.5 then do;
child1[ii] = parent2[ii];
child2[ii] = parent1[ii];
end;
end;
finish;
/*Section 3*/
start knapsack( x ) global(i);
dsNames = {cal10_1 cal10_2};
use (dsNames[i]);
read all var _NUM_ into nir;
close (dsNames[i]);
print nir;
X_new=j (140, 351, 0); /*@@@140: number of observations*/
do n=1 to 351;
if x[, n] =1
then do;
a=n+1;
X_new[,n]=nir[,a ];
end;
end;
data_new=nir[,1]||X_new;
create data_new from data_new;
append from data_new;
close data_new;
free data_new nir X_NEW;
SUBMIT;
ods select none; /*==========Close the output results==========*/
proc pls data=data_new method=pls cv=one NOCENTER NOSCALE;
model COL1= COL2-COL352 / SOLUTION;
ods output ResidualSummary=RS;
ods output CVResults=CVResults;
output out=score XSCORE=XSCORE;
RUN;
ENDSUBMIT;
PRINT X i;
X_sum=X[,+]; /*==========calculate the variable size==========*/
PRINT X_sum;
EDIT RS var _all_;
READ all var _all_ into RS;
CLOSE RS;
EDIT CVResults var _all_;
read all var _all_ into CVResults;
CLOSE CVResults;
EDIT score var _all_;
READ all var _all_ into score;
CLOSE score;
/*calculating PRESS*/
RMPress=RS[, 2];
create RMPress from RMPress; /*P*/
append from RMPress;
close RMPress; /*P*/
/*factor numbers*/ /*x: extract only factor number from CVResults*/
f_no=CVResults[2,1];
create f_no from f_no; /*P*/
append from f_no;
close f_no; /*P*/
/*n*/
n=nrow(score);
/*RMSECV*/
/*factor numbers*/ /*x: extract only factor number from CVResults*/
minRMPress=RS[f_no+1, 2];
RMSECV=SQRT(((n-1)*(minRMPress##2))/n);
RMSECV_T=iii//RMSECV;
return(RMSECV);
free RS CVResults score RMPress f_no minRMPress RMSECV_T RMSECV;
finish;
/*Section 4*/
/*GA module*//*=============================================================*/
dsNames = {cal10_1 cal10_2};
k=1;
do i = 1 to ncol(dsNames);
print i;
id = gasetup(2, 351, );
call gasetobj(id, 0, "knapsack" ); /* minimize objective module */
call gasetcro(id, 0.95, 0,"uniform_cross"); /* user crossover module */
call gasetmut(id,
0.01, /* mutation probabilty */
1);
call gasetsel(id, 2, /* carry 3 over to next generation */
0, /* dual tournament */
2 /* best-player-wins probabilty */
);
call gainit(id, 30, repeat({0,1},1, 351));
niter = 500; /*@@@@ now is 5, change niter to 500*/
summary = j(niter,2);
mattrib summary [c = {"bestValue", "avgValue"}];
call gagetval(value, id);
summary[1,1] = value[1];
summary[1,2] = value[:];
do iii = 1 to niter;
print k iii;
call garegen(id);
call gagetval(value, id);
summary[iii,1] = value[1];
summary[iii,2] = value[:];
end;
call gagetmem(mem, value, id, 1);
print "best member " mem[f = 1.0 l = ""],
"best value " value[l = ""];
iteration = t(1:niter);
print iteration summary;
call gaend(id);
mem_T=mem_T // mem;
value_T=value_T || value;
summary_T=summary_T || summary;
/*-------------------------------------------------------------------------------------------------------------------*/
/*---------------------------------------------------------------------------------------------------------*/
print k;
k=k+1;
end;
/*---------------------------------------------------------------------------------------------------------*/
/*-------------------------------------------------------------------------------------------------------------------*/
print mem_T value_T summary_T;
create m_1_100 from mem_T; /*@@@change name*/
append from mem_T;
close m_1_100; /*@@@change name*/
create v_1_100 from value_T; /*@@@change name*/
append from value_T;
close v_1_100; /*@@@change name*/
create s_1_100 from summary_T; /*@@@change name*/
append from summary_T;
close s_1_100; /*@@@change name*/
free mem_T;
free value_T;
free summary_T;
PROC EXPORT DATA= WORK.m_1_100
OUTFILE= "D:\nir\Export\gamem.csv"
DBMS=CSV LABEL REPLACE;
PUTNAMES=YES;
PROC EXPORT DATA= WORK.v_1_100
OUTFILE= "D:\nir\Export\gavalue.csv"
DBMS=CSV LABEL REPLACE;
PUTNAMES=YES;
PROC EXPORT DATA= WORK.s_1_100
OUTFILE= "D:\nir\Export\gasummary.csv"
DBMS=CSV LABEL REPLACE;
PUTNAMES=YES;
run;
QUIT;
I'd like to see the full error message from the LOG, but I suspect the problem is that you are doing a lot of memory-intensive computations in the 'knapsack' module, so let's try to clean that up. Some ideas are:
1) You are doing a lot of printing. I'm surprised that your output window is not filling up. Try less unnecessary output, and especially do not print from the 'knapsack' module, which gets called hundreds of times.
2) There is no need to read the data from the 'knapsack' module. In your DO loop over the data sets, read the data into the NIR matrix and send in that matrix as the GLOBAL variable to 'knapsack.' That way it is read only once.
3) Reading all variables in the output SCORE data set uses a lot of memory and is unnecessary.The only part of SCORE that you are using is the number of rows, and this is always the same as the number of rows in the input data. In fact, your computation of RMSECV only requires reading the CVResults ODS table, because that table contains the minimum root mean PRESS value. You don't need CVResults or SCORE data sets at all. (You also don't need the SOLUTION option on the MODEL statement.)
4) You can delete the code that writes the RMPress and f_no data sets for each iteration of the 'knapsack' function.
5) Put PLOTS=NONE on the PROC PLS statement. The PROC is creating graphics, but they are never shown because you have ODS SELECT NONE. It takes time and memory to produce those graphics.
Hello!
Thank you for some suggestions!
I would like to ask the target setting. Is it suitable in SAS 9.3 in Windows 7 as well?
I have do this setting, and it is still out of memory.
After I run the GA program, I use “proc options option=memsize value;” for checking the memory size=2G.
But,
no matter before I run the program or I open an another window of sas after I saw memsize=2G in the old sas window, I saw memsize=16G !
The second question is I can’t use nir as global matrix.
The attached are a refined version which spent less time and the second program is an version I want to send as the GLOBAL variable to 'knapsack.'
Your last image (when SAS is using 16GB) shows that the option was set by the SAS Session Startup Command Line. This is presumably the SAS session that occurs when you double-click the SAS icon on your desktop.
In contrast, the second image (when SAS is using 2GB) shows that the option was set by the sasv9.cfg file located in
C:\Program Files\SASHome\SASFoundation\9.3\nls\en\sasv9.cfg
I discuss this in my article, so go back and read the WHOLE article and follow the steps outlined there. Then you will be able to run your program while using 16GB of memory.
By the way, you might want to use
OPTIONS NONOTES;
to suppress the hundreds of NOTES that you get by writing those data sets.
Thanks! I have successfully increased the memsize to the RAM size, 16G, from this article: Large matrices in SAS/IML 14.1. http://blogs.sas.com/content/iml/2015/07/31/large-matrices.html
But, I can only run four data one time (totaly 1000 data). If we run more than four data, there is insufficient memory again.
I think there is a way the cut down the memory using in this program, because for one data, I just need to save three matrices which the dimension are mem=1x351, summary=500x2, value=1x1.
Could you give us some suggestions?
If you need renew version of programme or further information, please tell us.
So in your real program the loop
do i = 1 to ncol(dsNames);
goes through 1000 data sets? If so, the SUMMARY (and MEM) matrices might be getting very large.
Then I suggest you not concatenate all the results inside the DO loop. Currently, inside the loop you have:
mem_T=mem_T // mem;
value_T=value_T || value;
summary_T=summary_T || summary;
I suggest that you
1. CREATE the output data set before the loop.
2. Within the loop, use the SETOUT statement and the APPEND statement to write the data for each loop to a data set
3. CLOSE the data sets outside the loop
There is an example of this programming technique at the article "Writing data in chinks."
Good day!
Thanks of these suggestions so far!
There is a new problem we have never seen before.
It is about the sas system.
When we run the GA programme in the beginning, it was very fast. It is about three hour for a dataset, but now it spent more than four hours on a dataset.
I would like to ask how to go back to the speed in the beginning???
Thanks a lot!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.