Dear all,
I have the following problem: I am running some analyses on a very large dataset (about 80,000 obs). In order to reduce running time, I drew a random sample from this data set, reducing the number of observations to about 12,000. Strangely, the run time was not shorter but far longer then. I can exclude other sources of error, the problem must be due to the modifying of the data set. I tried to create indexes and change the format of the variables but this did not solve the problem.
The macro I used for the random subset looks like this:
%macro age_selection(start_custage, end_custage);
%let agedif = %eval(&end_custage. - &start_custage.);
data ncs.ncs2_cusbase_dev_&country.&start_custage._&end_custage.;
set ncs.ncs2_cusbase_dev_&country.;
random_variate=ranuni(1234);
%do i = 1 %to (&agedif.+1);
if ((random_variate > ((&i.-1)/(&agedif.+1))) and (random_variate <= &i./(&agedif.+1))) then sel_age = &i.;
%end;
%do i = &start_custage. %to &end_custage.;
data ncs.ncs2_cusbase_dev_AGE&i._&country. (drop = random_variate sel_age);
set ncs.ncs2_cusbase_dev_&country.&start_custage._&end_custage.; if sel_age = &i.;
run;
proc sql;
connect to odbc (dsn=edw);
execute(
create index idx_&i. on ncs.ncs2_cusbase_dev_AGE&i._&country. (custid, first_order_date)
) by odbc;
quit;
%end;
%mend;
%age_selection(1, 4);
Any help would be highly appreciated. Thanks a lot in advance,
Holger