Hi,
I'm pulling my hair out at the moment with this problem. I've got two datasets around 10G each, both containing timestamped transactional records. I want to effectively merge the second into the first where DATE_PROCESSED is between DATE_START and DATE_APPLIED. Sounds simple enough... but when I use PROC SQL (using left join) it creates a huuuge temporary file and dies as there isn't enough workspace. We've got 65G free in the temporary directory which I should think is plenty for this task.
So I ended up giving up on that approach and tried using a nested SET statement in a data step:
data test (keep=CUST_NUMBER CUST_LOCATION TRANS_TYPE PERIOD_DATE
ELEMENT_SOURCE DATE_APPLIED DATE_CEASED DATE_START DATE_END
BALANCE INTEREST WOF);
set elements;
INTEREST = 0;
WOF = 0;
LastObs = 0;
do until(LastObs);
set interest_and_writeoffs
(where=(INT_CUST_NUMBER=CUST_NUMBER & INT_CUST_LOCATION=CUST_LOCATION
& PI_TRANS_TYPE=TRANS_TYPE & PI_PERIOD_DATE=PERIOD_DATE
& PI_DATE_PROCESSED >= DATE_START
& PI_DATE_PROCESSED <= dhms(datepart(DATE_APPLIED),0,0,0)));
if PI_CUST_NUMBER ne CUST_NUMBER or PI_CUST_LOCATION ne CUST_LOCATION
or PI_TRANS_TYPE ne TRANS_TYPE or PI_PERIOD_DATE ne PERIOD_DATE
or PI_DATE_PROCESSED > dhms(datepart(DATE_APPLIED),0,0,0)
then LastObs = 1;
else do;
INTEREST = INTEREST + PENALTIES_AND_INTEREST;
WOF = WOF + WRITEOFFS;
end;
end;
output;
run;
So basically the inner loop uses a WHERE= clause on the 'interest_and_writeoffs' dataset to get a subset matching the customer / period details in the main set statement. Except when I try and run it I get the error:
ERROR: Variable CUST_NUMBER is not on file INTEREST_AND_WRITEOFFS.
So I tried copying CUST_NUMBER to another variable first, but it still didn't like it. Upon consulting the manual, I see that the WHERE clause only takes constants, but WHERE= is supposed to be able to handle variables. Now i'm stumped and pretty close to doing nasty things to my workstation, and I think my colleagues are sick of hearing me curse, please help!
... View more