The declare, in the documentation, references reading from work library not cas library.
*create table with duplicate IDs, numeric and character variables;
data test;
do id=1 to 100000 ;
var_n=id*2;
var_c=cats(_N_,"_n_",var_n);
output;
var_n=1/id;
var_c=cats(_N_,"_f_",var_n);
output;
end;
run;
8use Hash for aggregation by ID;
data _null_;
if _n_=1 then do;
dcl hash h(ordered:"A");
h.defineKey("id");
h.defineData("id","var","var_c");
h.defineDone();
end;
set test end=lr;
if h.find()=0 then do;
var=sum(var,var_n);
h.replace();
end;
else do;
var=var_n;
h.add();
end;
if lr then h.output(dataset:'out');
run;
The log from the above program running on a work server gets me 100000,which is exactly how many I do expect-- because in the first data step we have two entries per each id.
NOTE: The data set WORK.OUT has 100000 observations and 3 variables.
NOTE: There were 200000 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
real time 0.16 seconds
user cpu time 0.12 seconds
system cpu time 0.04 seconds
memory 33531.40k
OS Memory 65156.00k
libname casuser cas;
data casuser.test;
do id=1 to 100000 ;
var_n=id*2;
var_c=cats(_N_,"_n_",var_n);
output;
var_n=1/id;
var_c=cats(_N_,"_f_",var_n);
output;
end;
run;
The above creates expected 200k entries in CAS library--works as expected.
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step has no input data set and will run in a single thread.
NOTE: The table test in caslib CASUSER(xxxx) has 200000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.29 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
data _null_;
if _n_=1 then do;
dcl hash h(ordered:"A");
h.defineKey("id");
h.defineData("id","var","var_c");
h.defineDone();
end;
set casuser.test end=lr;
if h.find()=0 then do;
var=sum(var,var_n);
h.replace();
end;
else do;
var=var_n;
h.add();
end;
if lr then h.output(dataset:'out');
run;
Running the above with reference to casuser.test --cas library produces, some extra 5 entries
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 200000 observations read from the table TEST in caslib CASUSER(xxxx).
NOTE: The table out in caslib CASUSER(xxxx) has 100005 observations and 3 variables.
NOTE: DATA statement used (Total process time):
The above is very basic sample; I am using iterators and and multiple hashes--code works in SAS processing on a single work station. Only when start using CAS the total numbers for rows are way off and the data becomes meaningless.
... View more