BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
atishp
Calcite | Level 5

Hi

 

I am trying to loop through a file and flag all accounts that meet a specific criteria.

I want to to set the flag value to 1 on records that have same customer numbers, account numbers, date and opposite amounts.

 

What I want to do is:

  1. Check the number of records that have the same customer number and date  and assign value k (I have this part working)
  2. For each cluster of customer number and date flag accounts with the opposite amount values
  3. Once it has completed a cluster it should move onto the next set of records
  4. There is no point in comparing across clusters

 

Data and result I am aiming for: (create new variable flag)

 

Customer NumberAccount NumberDateAmountkFlag 
C001Acc0012015/06/0355010 
C001Acc0012015/06/08-19971Clustering
C001Acc0022015/06/0819971
C001Acc0012015/06/08385070
C001Acc0032015/06/08-385070
C001Acc0012015/06/08-15776.7571
C001Acc0012015/06/08999.2370
C001Acc0032015/06/0815776.7571
C001Acc0012015/06/26-1000010 
C001Acc0012015/06/29440020Clustering
C001Acc0012015/06/29-850020
C001Acc0012015/06/30-150021
C001Acc0022015/06/30150021
C002Various Account NumberVarious Datesxxxxxx 
C003Various Account NumberVarious Datesxxxxxx 

 

I have built the following code logic:

 

/* Macro to SCAN through DATALOG */									
%MACRO SCANLOOP(SCANFILE,FIELD1,FIELD2, FIELD3, FIELD4, FIELD5);									
/* First obtain the number of records in DATALOG */									
									
DATA _NULL_;									
	IF 0 THEN SET &SCANFILE NOBS=X;								
	CALL SYMPUTX('RECCOUNT',X);								
	%put &RECCOUNT.;								
STOP;									
RUN;									
									
/* loop from one to number of records */									
/*%DO I=1 %TO &RECCOUNT;*/									
%DO I=1 %TO 100;									
									
DATA _NULL_;									
	/* Advance to the Ith record */								
	SET &SCANFILE (FIRSTOBS=&I);								
	/* store the variables of interest in */								
	/* macro variables */								
/*	Symput is for characters*/								
/*	symputx is for numeric to character conversion with removal*/								
/*	of leading and trailing spaces*/								
	CALL SYMPUT('VAR1',&FIELD1);								
	CALL SYMPUTX('VAR2',&FIELD2);								
	CALL SYMPUTX('VAR3',&FIELD3);								
	CALL SYMPUTX('VAR4',&FIELD4);								
	CALL SYMPUTX('VAR5',&FIELD5);								
	CALL SYMPUTX('VAR6',k);								
									
									
	%put &VAR1. &VAR2. &VAR3. &VAR4. &VAR5. &VAR6.;								
									
									
STOP;									
RUN;									
									
	/* now perform the tasks that */								
	/* wish repeated for each */								
	/* observation */								
									
/*SETUP a blank file */									
/*DATA RECON5;*/									
/*	if &i=0 then SET recon4 (OBS=0);*/								
/*run;*/									
									
									
DATA recon5;									
	SET recon5 recon4;								
									
	IF Narrative ^= 'DAILY BALANCE'								
		AND customer_number = symget('VAR2')							
			AND date = symget('VAR3')						
				AND account_number NOT = symget('VAR4')					
					AND amount = -(symget('VAR5')) THEN DO;				
						flag=1;			
						output recon5; 			
					END;				
						ELSE flag=0;			
									
									
									
/*input(original_variable, informat.)*/									
									
									
%END;									
%MEND SCANLOOP;									
/* Call SCANLOOP */									
/*%SCANLOOP(DATALOG,FILENM,DESC);*/									
%SCANLOOP(recon4,narrative,customer_number,date,account_number,amount);									
RUN;									
  1.  The current issues with the code is that it does not write the results to a single consolidated file
  2. It has no idea on how to use k
    • K is counts the number of records belonging to a customer with the same date
    • Ideally the logic should use this and help reduce time to process records, by jumping across records

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

That would be very complicated.

You also need to consider whether the positive number is greater than negative number .

 

data have;
infile cards expandtabs truncover;
input CustomerNumber $	AccountNumber $ Date : yymmdd10. Amount	k;
format date yymmdd10.;
cards;
C001	Acc001	2015/06/03	550	1	0	 	0
C001	Acc001	2015/06/08	-199	7	1	Payment to acc002	1
C001	Acc002	2015/06/08	199	7	0	McDonalds	1
C001	Acc002	2015/06/08	199	7	1	Receipt acc002	1
C001	Acc002	2015/06/08	50	7	0	Toll Charge	1
C001	Acc002	2015/06/08	50	7	0	Toll Charge	1
C001	Acc001	2015/06/08	3850	7	1	 	1
C001	Acc003	2015/06/08	-3850	7	1	 	1
C001	Acc001	2015/06/08	-15776.75	7	1	 	1
C001	Acc001	2015/06/08	999.23	7	0	 	0
C001	Acc003	2015/06/08	15776.75	7	1	 	1
C001	Acc001	2015/06/26	-10000	1	0	 	0
C001	Acc001	2015/06/29	4400	2	0	 	0
C001	Acc001	2015/06/29	-8500	2	0	 	0
C001	Acc001	2015/06/30	-1500	2	1	 	1
C001	Acc002	2015/06/30	1500	2	1
;
run;
data temp;
 set have;
 by k notsorted;
 group+first.k;
 abs=abs(amount);
run;
proc sort data=temp;by group abs;run;
data positive negative(rename=(CustomerNumber=_c
AccountNumber=_ac Date=_d Amount=_am k=_k));
 set temp;
 if amount gt 0 then output positive;
  else output negative;
run;
data x;
 ina=0;inb=0;
 merge positive(in=ina) negative(in=inb);
 by group abs;
 one=ina;two=inb;
 output;
 call missing(of _all_);
run;

data want;
 set x;
 if one and not two then do;flag=0;output;end;
  else if not one and two then do;
   CustomerNumber=_c;
   AccountNumber=_ac;
   Date=_d;
   Amount=_am; 
   k=_k;
   flag=0;output;
  end;
  else do;
   flag=1;output;
   CustomerNumber=_c;
   AccountNumber=_ac;
   Date=_d;
   Amount=_am; 
   k=_k;
   output;
  end;
drop  group abs one two _:;
run;

View solution in original post

8 REPLIES 8
Reeza
Super User

Shouldn't records 4/5 be flagged as well? 

 

Edit: How big is the data you're working with, number of rows? 

 

Second, have you fully explored the variables and data available. Typically when a record os reversed or a return is processed there some ID that's unique to those records to allow for matching, besides the amount. Sometimes it's a transaction ID or just an indicator on the field status. 

 

You could consider reversing the sign on the returns but adding a field that identifies returns, a return flag. Then you could leverage BY group logic to identify you returned purchases since the amounts are the same, but they'll have different return flags. 

atishp
Calcite | Level 5

Hi Reeza

 

Thanks for the feedback. Highly appreciate it.

 

You are correct;  records 4/5 should be flagged as well; indeed an error on my end.

 

The data is large ~ 500 million records spanning over two years.

 

I have also looked at the possibility of using other variables however  I could not identify any. I also chatted to our IT teams around how the data is being generated in the core systems and I have not been able to identify any possible work around using the existing variables.

 

Thanks

Ksharp
Super User

you need to watch out the decimal part, maybe consider to use function ROUND().

 

 

data have;
infile cards expandtabs truncover;
input CustomerNumber $	AccountNumber $ Date : yymmdd10. Amount	k;
format date yymmdd10.;
cards; 
C001	Acc001	2015/06/03	550	1	0	 
C001	Acc001	2015/06/08	-199	7	1	Clustering
C001	Acc002	2015/06/08	199	7	1
C001	Acc001	2015/06/08	3850	7	0
C001	Acc003	2015/06/08	-3850	7	0
C001	Acc001	2015/06/08	-15776.75	7	1
C001	Acc001	2015/06/08	999.23	7	0
C001	Acc003	2015/06/08	15776.75	7	1
C001	Acc001	2015/06/26	-10000	1	0	 
C001	Acc001	2015/06/29	4400	2	0	Clustering
C001	Acc001	2015/06/29	-8500	2	0
C001	Acc001	2015/06/30	-1500	2	1
C001	Acc002	2015/06/30	1500	2
;
run;
data temp;
 set have;
 by k notsorted;
 group+first.k;
 abs=abs(amount);
run;
proc sort data=temp;by group abs;run;
data want;
 set temp;
 by group abs notsorted;
 if first.abs and last.abs then flag=0;
  else flag=1;
drop group abs;
run;
atishp
Calcite | Level 5

Hi KSharp

 

Thanks for the nifty code. 

Works really great on the sample I provided Cat Very Happy

 

I have applied it against the full data set and noticed the following:

 

  1. On transcations that are occuring but not "transfers" they are being flagged as well

I have provided an updated record set to illustrate this specific issue. 

Do you have any suggestions?

 

Customer NumberAccount NumberDateAmountkFlag_I_WantNarrativeCurrent_Flag
C001Acc0012015/06/0355010 0
C001Acc0012015/06/08-19971Payment to acc0021
C001Acc0022015/06/0819970McDonalds1
C001Acc0022015/06/0819971Receipt acc0021
C001Acc0022015/06/085070Toll Charge1
C001Acc0022015/06/085070Toll Charge1
C001Acc0012015/06/08385071 1
C001Acc0032015/06/08-385071 1
C001Acc0012015/06/08-15776.7571 1
C001Acc0012015/06/08999.2370 0
C001Acc0032015/06/0815776.7571 1
C001Acc0012015/06/26-1000010 0
C001Acc0012015/06/29440020 0
C001Acc0012015/06/29-850020 0
C001Acc0012015/06/30-150021 1
C001Acc0022015/06/30150021 1
C002Various Account NumberVarious Datesxxxxxx  
C003Various Account NumberVarious Datesxxxxxx  

 

Thanks

atishp
Calcite | Level 5

Just to add another point the payment and receipt narratives are not generic; I have simplified it for illustration purposes.

atishp
Calcite | Level 5

I figured that if I just join the table to itself I can get the desired results as well.

Ksharp
Super User

That would be very complicated.

You also need to consider whether the positive number is greater than negative number .

 

data have;
infile cards expandtabs truncover;
input CustomerNumber $	AccountNumber $ Date : yymmdd10. Amount	k;
format date yymmdd10.;
cards;
C001	Acc001	2015/06/03	550	1	0	 	0
C001	Acc001	2015/06/08	-199	7	1	Payment to acc002	1
C001	Acc002	2015/06/08	199	7	0	McDonalds	1
C001	Acc002	2015/06/08	199	7	1	Receipt acc002	1
C001	Acc002	2015/06/08	50	7	0	Toll Charge	1
C001	Acc002	2015/06/08	50	7	0	Toll Charge	1
C001	Acc001	2015/06/08	3850	7	1	 	1
C001	Acc003	2015/06/08	-3850	7	1	 	1
C001	Acc001	2015/06/08	-15776.75	7	1	 	1
C001	Acc001	2015/06/08	999.23	7	0	 	0
C001	Acc003	2015/06/08	15776.75	7	1	 	1
C001	Acc001	2015/06/26	-10000	1	0	 	0
C001	Acc001	2015/06/29	4400	2	0	 	0
C001	Acc001	2015/06/29	-8500	2	0	 	0
C001	Acc001	2015/06/30	-1500	2	1	 	1
C001	Acc002	2015/06/30	1500	2	1
;
run;
data temp;
 set have;
 by k notsorted;
 group+first.k;
 abs=abs(amount);
run;
proc sort data=temp;by group abs;run;
data positive negative(rename=(CustomerNumber=_c
AccountNumber=_ac Date=_d Amount=_am k=_k));
 set temp;
 if amount gt 0 then output positive;
  else output negative;
run;
data x;
 ina=0;inb=0;
 merge positive(in=ina) negative(in=inb);
 by group abs;
 one=ina;two=inb;
 output;
 call missing(of _all_);
run;

data want;
 set x;
 if one and not two then do;flag=0;output;end;
  else if not one and two then do;
   CustomerNumber=_c;
   AccountNumber=_ac;
   Date=_d;
   Amount=_am; 
   k=_k;
   flag=0;output;
  end;
  else do;
   flag=1;output;
   CustomerNumber=_c;
   AccountNumber=_ac;
   Date=_d;
   Amount=_am; 
   k=_k;
   output;
  end;
drop  group abs one two _:;
run;
atishp
Calcite | Level 5

Thank again

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1911 views
  • 2 likes
  • 3 in conversation