Hi all, I need to cluster many time series. In the past (SAS 9.4), I've used proc timeseries, which was constrained to a single process, and this way, the calculation took a long time to measure the distance between any time series provided. Non I'm on viya 3.5, and I've tried both the TSMODEL approach with the TSD Package and rewritten the code using timeData.runTimeCode action set. But still, I'm in the same situation, even if I'm running the code on a Viya 3.5 machine with 80 physical CPUs. I've got over 8k series, with daily data spanning from Jan2008 up to Apr2021. Here is the code with TSMODEL: proc tsmodel data=casuser.gnc_vol_t_tra_impute_all outlog=casuser.outlog
outobj=(of=casuser.outtsddist(replace=YES) );
var _TR1_SI_30:;
id data interval=day;
require tsd;
submit;
declare object f(DTW);
declare object of(OUTTSD);
rc=f.Initialize();
rc=f.SetTarget(&si_remi_30);
rc=f.SetOption("METRIC", "RSQRDEV", "NORMALIZE", "STD", "TRIM", "BOTH");
rc=f.Run();
if rc < 0 then
stop;
rc=of.Collect(f);
if rc < 0 then
stop;
endsubmit;
print outlog;
run; And here is the code with PROC CAS: %macro cmpcode();
declare object f(DTW);
declare object of(OUTTSD);
rc=f.Initialize();
rc=f.SetTarget(&batch1_p);
rc=f.SetOption('METRIC', 'RSQRDEV', 'NORMALIZE', 'STD', 'TRIM', 'BOTH');
rc=f.Run();
if rc < 0 then
stop;
rc=of.Collect(f);
if rc < 0 then
stop;
%mend;
proc cas;
* like proc contents or SQL on Dictionary libref ;
table.columnInfo result=allvars / table={name="gnc_vol_t_tra_impute_all"};
run;
saveresult allvars casout="myallvars";
* reading vars with a custom filter for interval vars;
table.fetch result=selectedVars / table={name='myallvars', where="
Column not like '_TR1_tot_%' and Column not in ('_TR1_period', '_TR1_residuo',
'_TR1_settimana', 'data', 'int_conf', '_NAME_', '_PGNC_')
"}, fetchvars={{name='Column'}} to=&limit maxrows=&limit;
run;
* array creation for runtimecode and DST object ;
varList=${};
oth_varlist=${};
do row over selectedVars.Fetch;
singleVar=compress(row.Column);
/* varList[row._Index_]= "{name="||quote(singleVar) || "}"; */
varList[row._Index_]= singleVar;
end;
print varList;
cmpcode="%cmpcode()";
timeData.runTimeCode result=run /
table={name="gnc_vol_t_tra_impute_all"}
logControl={{keep=TRUE, sev="ERROR"}}
require={{pkg="TSD"}}
series=varList
timeid="data"
interval="day"
objOut={
{objRef="of", table={name='outtsddist' replace=TRUE}}
}
logout ={name="TSMODEL_LOG" replace=True}
code=cmpcode;
run;
quit; And here I'm reporting the time expended: N# Vars Secs 10 3,60 20 14,40 40 57,60 80 230,40 160 921,60 320 3.686,40 If I project the time needed for the 8k time series, I will need over 40 days of calculation. My question is: has SAS implemented some faster algorithms like MASS (https://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html)? If not, any suggestion is welcome!
... View more