Hi,
I have the following data:
User | Date | Transaction Amount |
A | 10-Aug-21 | 120 |
A | 11-Aug-21 | 110 |
A | 15-Aug-21 | 600 |
A | 20-Aug-21 | 585 |
A | 21-Aug-21 | 130 |
B | 7-Aug-21 | 300 |
B | 8-Aug-21 | 350 |
B | 8-Aug-21 | 1450 |
B | 9-Aug-21 | 10000 |
I want to flag those users who does a transaction with value greater than 5 times the average of all the last transactions and remove these anomalous transactions from the average calculation in the next step.
So my table would look like:
User | Date | Transaction Amount | Average | Flag |
A | 10-Aug-21 | 120 | 0 | 0 |
A | 11-Aug-21 | 110 | 120 | 0 |
A | 15-Aug-21 | 600 | 115 | 1 |
A | 20-Aug-21 | 585 | 115 | 1 |
A | 21-Aug-21 | 130 | 115 | 0 |
B | 7-Aug-21 | 300 | 0 | 0 |
B | 8-Aug-21 | 350 | 300 | 0 |
B | 8-Aug-21 | 1450 | 325 | 0 |
B | 9-Aug-21 | 11000 | 2100 | 1 |
So here, For each transaction of an user,we need to look at the average of the last transactions and check if the current transaction amount > 5 times the average amount. If yes, then flag variable will take the value 1. Now, When we see such a case we will not include that particular transaction for average calculation in the next step, so that we don't miss the next anomalous transaction due to the increase in the average of the last transactions.
Appreciate any suggestions!
This can be achieved by using FIRST. LAST. Processing
/* Create test input */ data have ; input user $ date date7. amount ; cards ; A 10Aug21 120 A 11Aug21 110 A 15Aug21 600 A 20Aug21 585 A 21Aug21 130 B 7Aug21 300 B 8Aug21 350 B 8Aug21 1450 B 9Aug21 10000 ; run ; /* Sort by user and date */ proc sort data=have out=srtd ; by user date ; run ; data want ; /* retain value of sum and count over iterations of the data step */ retain sum 0 count 0 ; /* read the sorted dataset by user/date */ set srtd ; by user date ; /* If it's the 1st occurrance of a user */ if first.user then do ; sum=amount ; count=1 ; flag=0 ; end ; /* If it is NOT the 1st occurrance of a user */ else do ; /* Calculate the average prior to this observation */ avg=sum/count ; /* If amount is less than 5x the average */ if amount<avg*5 then do ; /* increate the sum (total) and count of valid observations */ sum=sum+amount ; count=count+1 ; /* Set flag to 0 */ flag=0 ; end ; /* If amount is greater than 5x the average */ else do ; /* Set flag to 1 */ flag=1 ; end ; end ; /* Output observation */ output ; run ;
This can be achieved by using FIRST. LAST. Processing
/* Create test input */ data have ; input user $ date date7. amount ; cards ; A 10Aug21 120 A 11Aug21 110 A 15Aug21 600 A 20Aug21 585 A 21Aug21 130 B 7Aug21 300 B 8Aug21 350 B 8Aug21 1450 B 9Aug21 10000 ; run ; /* Sort by user and date */ proc sort data=have out=srtd ; by user date ; run ; data want ; /* retain value of sum and count over iterations of the data step */ retain sum 0 count 0 ; /* read the sorted dataset by user/date */ set srtd ; by user date ; /* If it's the 1st occurrance of a user */ if first.user then do ; sum=amount ; count=1 ; flag=0 ; end ; /* If it is NOT the 1st occurrance of a user */ else do ; /* Calculate the average prior to this observation */ avg=sum/count ; /* If amount is less than 5x the average */ if amount<avg*5 then do ; /* increate the sum (total) and count of valid observations */ sum=sum+amount ; count=count+1 ; /* Set flag to 0 */ flag=0 ; end ; /* If amount is greater than 5x the average */ else do ; /* Set flag to 1 */ flag=1 ; end ; end ; /* Output observation */ output ; run ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.