- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to reduce the computational time of a program which uses large datasets. The outline of the program is as follows:
1.create an empty base table A.
2. do the following for I=0 to 35;
3. create a table B_I;
4. append the table B_I to A;
5. end do;
Can someone suggest me some possible techniques to reduce the computational time of the code or if the code can be run parallel by breaking it into small steps?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Write the program in such a manner that table A is created in one step.
You could only parallelize the creation of the individual datasets, but the append/concatenation has to be done by one process, as only one process can have write access to table A at any given time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thats all well and good, but you haven't really shown us anything. From what you post:
data a;
/* create empty table part */
length var1 $20 var2 8;
/* append data part */
do i=1 to 35;
/* insert tableb creation here */
end;
run;
By creatinig data separately and then appending you are creating 35 * read/writes plus headers for each of them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The code roughly looks like this.
%macro account ();
PROC SQL;
create table temp.base(
var1 length=8,
var2 length=8,
var3 length=8,
var4 length=8,
var5 length=8,
var6 length=8,
var7 length=8,
);
QUIT;
%do i = 0 %to &number.;
DATA _NULL_;
*--- determine year-month of loop in sas date;
YYYY_MM_SAS = INTNX("MONTH", "&MORT_S_DATE."D, &I.);
CALL SYMPUTX ('YYYY_MM_SAS', YYYY_MM_SAS);
*--- determine month of loop in YYYY_MM format;
CALL SYMPUTX ('YYYY_MM', tranwrd(put(YYYY_MM_SAS, yymmd8.),"-","_"));
*--- determine month of loop in YYYYMM format;
CALL SYMPUTX ('YYYYMM', put(YYYY_MM_SAS, yymmn6.));
RUN;
DATA add&YYYY_MM._PERF;
LENGTH
var1 length=8
var2 length=8
var3 length=8
var4 length=8
var5 length=8
var6 length=8
var7 length=8
;
SET arrears_credit (keep=var1 var2
where=(year_month=&YYYYMM.));
SET ACCOUNT_&YYYY_MM. (KEEP = var 1 var3 var4 var5 var6 var7
)
KEY = var1 / UNIQUE;
RUN;
PROC APPEND BASE = base DATA = add&YYYY_MM._PERF;
RUN;
PROC SQL;
DROP TABLE add&YYYY_MM._PERF;
QUIT;
%end;
%mend account;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, unfortunately a common issue this one, and all caused by the fact that date is a part of the dataset name. Think how much simpler the world would be if you had one dataset, with a fixed name and structure, and a column which contained the date, no looping or appending, just simple data. I don't know where this thinking came in but it does seem to account for more than half the questions on this forum.
Anyways (not tested):
%let mort_s_date=12JAN2014;
%let number=5;
data _null_;
call execute('data base;');
do i=1 to &number.;
tmp=intnx("month","&mort_s_date."d, &i.);
call execute('set arrears_credit (keep=var1 var2 where=(yearmonth='||put(tmp,yymm6.)||'));');
call execute('set account_'||put(year(tmp),z4.)||'_'||put(month(tmp),z2.)||' (keep=var1 var3 var4 var5 var6 var7);');
end;
run;