Hi I am trying to loop through a file and flag all accounts that meet a specific criteria. I want to to set the flag value to 1 on records that have same customer numbers, account numbers, date and opposite amounts. What I want to do is: Check the number of records that have the same customer number and date and assign value k (I have this part working) For each cluster of customer number and date flag accounts with the opposite amount values Once it has completed a cluster it should move onto the next set of records There is no point in comparing across clusters Data and result I am aiming for: (create new variable flag) Customer Number Account Number Date Amount k Flag C001 Acc001 2015/06/03 550 1 0 C001 Acc001 2015/06/08 -199 7 1 Clustering C001 Acc002 2015/06/08 199 7 1 C001 Acc001 2015/06/08 3850 7 0 C001 Acc003 2015/06/08 -3850 7 0 C001 Acc001 2015/06/08 -15776.75 7 1 C001 Acc001 2015/06/08 999.23 7 0 C001 Acc003 2015/06/08 15776.75 7 1 C001 Acc001 2015/06/26 -10000 1 0 C001 Acc001 2015/06/29 4400 2 0 Clustering C001 Acc001 2015/06/29 -8500 2 0 C001 Acc001 2015/06/30 -1500 2 1 C001 Acc002 2015/06/30 1500 2 1 C002 Various Account Number Various Dates xxx x xx C003 Various Account Number Various Dates xxx x xx I have built the following code logic: /* Macro to SCAN through DATALOG */
%MACRO SCANLOOP(SCANFILE,FIELD1,FIELD2, FIELD3, FIELD4, FIELD5);
/* First obtain the number of records in DATALOG */
DATA _NULL_;
IF 0 THEN SET &SCANFILE NOBS=X;
CALL SYMPUTX('RECCOUNT',X);
%put &RECCOUNT.;
STOP;
RUN;
/* loop from one to number of records */
/*%DO I=1 %TO &RECCOUNT;*/
%DO I=1 %TO 100;
DATA _NULL_;
/* Advance to the Ith record */
SET &SCANFILE (FIRSTOBS=&I);
/* store the variables of interest in */
/* macro variables */
/* Symput is for characters*/
/* symputx is for numeric to character conversion with removal*/
/* of leading and trailing spaces*/
CALL SYMPUT('VAR1',&FIELD1);
CALL SYMPUTX('VAR2',&FIELD2);
CALL SYMPUTX('VAR3',&FIELD3);
CALL SYMPUTX('VAR4',&FIELD4);
CALL SYMPUTX('VAR5',&FIELD5);
CALL SYMPUTX('VAR6',k);
%put &VAR1. &VAR2. &VAR3. &VAR4. &VAR5. &VAR6.;
STOP;
RUN;
/* now perform the tasks that */
/* wish repeated for each */
/* observation */
/*SETUP a blank file */
/*DATA RECON5;*/
/* if &i=0 then SET recon4 (OBS=0);*/
/*run;*/
DATA recon5;
SET recon5 recon4;
IF Narrative ^= 'DAILY BALANCE'
AND customer_number = symget('VAR2')
AND date = symget('VAR3')
AND account_number NOT = symget('VAR4')
AND amount = -(symget('VAR5')) THEN DO;
flag=1;
output recon5;
END;
ELSE flag=0;
/*input(original_variable, informat.)*/
%END;
%MEND SCANLOOP;
/* Call SCANLOOP */
/*%SCANLOOP(DATALOG,FILENM,DESC);*/
%SCANLOOP(recon4,narrative,customer_number,date,account_number,amount);
RUN; The current issues with the code is that it does not write the results to a single consolidated file It has no idea on how to use k K is counts the number of records belonging to a customer with the same date Ideally the logic should use this and help reduce time to process records, by jumping across records
... View more