Hi All-
I am, going to start with a little background. I do have a dataset like the following :
Cust_ID | TransactionID | POST_DATE | FEE |
123 | 321 | 2/28/2018 | |
123 | 322 | 2/10/2018 | |
123 | 323 | 2/2/2018 | |
123 | 324 | 2/1/2018 | |
123 | 325 | 11/27/2017 | |
456 | 326 | 2/1/2018 | |
987 | 327 | 3/5/2018 | |
987 | 328 | 2/5/2018 | |
987 | 329 | 1/5/2018 | |
987 | 330 | 12/5/2017 | |
987 | 331 | 11/6/2017 | |
654 | 332 | 11/13/2017 | |
654 | 333 | 2/28/2018 | |
654 | 334 | 2/28/2018 | |
654 | 335 | 2/26/2018 | |
654 | 336 | 2/26/2018 | |
654 | 337 | 2/23/2018 | |
654 | 338 | 2/21/2018 | |
654 | 339 | 2/21/2018 | |
654 | 340 | 2/20/2018 | |
654 | 341 | 2/20/2018 |
I would like to loop through this dataset and update the Flag column with a 1 depending on information obtained in the following condition by cust_id:
-No fee for up to three transactions per rolling 30-day period.
This would update the dataset to be the following :
Cust_ID | TransactionID | POST_DATE | FEE |
123 | 321 | 2/28/2018 | 1 |
123 | 322 | 2/10/2018 | 0 |
123 | 323 | 2/2/2018 | 0 |
123 | 324 | 2/1/2018 | 0 |
123 | 325 | 11/27/2017 | 0 |
456 | 326 | 2/1/2018 | 0 |
987 | 327 | 3/5/2018 | 0 |
987 | 328 | 2/5/2018 | 0 |
987 | 329 | 1/5/2018 | 0 |
987 | 330 | 12/5/2017 | 0 |
987 | 331 | 11/6/2017 | 0 |
654 | 332 | 11/13/2017 | 0 |
654 | 333 | 2/28/2018 | 1 |
654 | 334 | 2/28/2018 | 1 |
654 | 335 | 2/26/2018 | 1 |
654 | 336 | 2/26/2018 | 1 |
654 | 337 | 2/23/2018 | 1 |
654 | 338 | 2/21/2018 | 1 |
654 | 339 | 2/21/2018 | 0 |
654 | 340 | 2/20/2018 | 0 |
654 | 341 | 2/20/2018 | 0 |
I have been trying to retain the current record value but i have not been able to do it. I have got the flag only validation the current record with the next record. There's tons of documentation online that sort of applies, but nothing specific enough and try as I might, I can't seem to work it out. I've been tempted to export the data to a file, parse it using VB and then reimport, but that is horribly backwards and inefficient. Plus, I won't learn anything about SAS.
Any help would be greatly appreciated.
Thanks,
Diego
If, for any post_date, there are more that 3 transactions between post_date-29 and post_date (inclusive) then fee=1. BUT, while your data are grouped by cust_id, they are not sorted by date within cust_id. So, to avoid sorting, you can use a double dow. I.e. have 2 do loops reading each cust_id. The first loop increments a subset of a data array (every date from post_date through post_date+29). The second loop checks the result for any given post_date, looking for values over 3:
data have;
input Cust_ID TransactionID POST_DATE :mmddyy10.;
format post_date date9.;
*if cust_id=987;
datalines;
123 321 2/28/2018
123 322 2/10/2018
123 323 2/2/2018
123 324 2/1/2018
123 325 11/27/2017
456 326 2/1/2018
987 327 3/5/2018
987 328 2/5/2018
987 329 1/5/2018
987 330 12/5/2017
987 331 11/6/2017
654 332 11/13/2017
654 333 2/28/2018
654 334 2/28/2018
654 335 2/26/2018
654 336 2/26/2018
654 337 2/23/2018
654 338 2/21/2018
654 339 2/21/2018
654 340 2/20/2018
654 341 2/20/2018
run;
%let lower_bound=%sysfunc(inputn(01jan2017,date9.));
%let upper_bound=%sysfunc(inputn(31jan2019,date9.));
data want (drop=_:);
array _count{&lower_bound:&upper_bound} ;
do until (last.cust_id);
set have;
by cust_id notsorted;
do _d=post_date to post_date+29;
_count{_d}=sum(_count{_d},1);
end;
end;
do until (last.cust_id);
set have;
by cust_id notsorted;
fee=(_count{post_date}>3);
output;
end;
run;
Notice the _count array is index by date. And the lower bound of the index is set to the numerical equivalent of sas data value 01jan2017, and the upper bound to the equivalent of 31jan2019. Just use a lower bound date as the minimum expected date, and the upper bound at least 29 days after the highest expected date.
So you want FEE=0 as a starting point, but if three transactions happen for the same cust_id, you want the fourth to have FEE=1, correct?
Yes, The starting to point should be 0, but the fourth or the next ones should be 1 if the were posted per rolling 30-day period
If occurs to me that you expected transactions to be sorted in descending order within a given cust_id.
If so, then presumably transaction 332 (first trans for id 654) is in error. It's the only transaction that doesn't follow the descending date rule. If that was unintentional, then the solution is a very simple self-merge data step:
data want2;
merge have
have (firstobs=4 keep=cust_id post_date
rename=(cust_id=_id4 post_date=_pd4));
if cust_id=_id4 and (post_date< _pd4+29) then fee=1;
else fee=0;
run;
This is awesome!
I really appreciate it
Many thanks!
If, for any post_date, there are more that 3 transactions between post_date-29 and post_date (inclusive) then fee=1. BUT, while your data are grouped by cust_id, they are not sorted by date within cust_id. So, to avoid sorting, you can use a double dow. I.e. have 2 do loops reading each cust_id. The first loop increments a subset of a data array (every date from post_date through post_date+29). The second loop checks the result for any given post_date, looking for values over 3:
data have;
input Cust_ID TransactionID POST_DATE :mmddyy10.;
format post_date date9.;
*if cust_id=987;
datalines;
123 321 2/28/2018
123 322 2/10/2018
123 323 2/2/2018
123 324 2/1/2018
123 325 11/27/2017
456 326 2/1/2018
987 327 3/5/2018
987 328 2/5/2018
987 329 1/5/2018
987 330 12/5/2017
987 331 11/6/2017
654 332 11/13/2017
654 333 2/28/2018
654 334 2/28/2018
654 335 2/26/2018
654 336 2/26/2018
654 337 2/23/2018
654 338 2/21/2018
654 339 2/21/2018
654 340 2/20/2018
654 341 2/20/2018
run;
%let lower_bound=%sysfunc(inputn(01jan2017,date9.));
%let upper_bound=%sysfunc(inputn(31jan2019,date9.));
data want (drop=_:);
array _count{&lower_bound:&upper_bound} ;
do until (last.cust_id);
set have;
by cust_id notsorted;
do _d=post_date to post_date+29;
_count{_d}=sum(_count{_d},1);
end;
end;
do until (last.cust_id);
set have;
by cust_id notsorted;
fee=(_count{post_date}>3);
output;
end;
run;
Notice the _count array is index by date. And the lower bound of the index is set to the numerical equivalent of sas data value 01jan2017, and the upper bound to the equivalent of 31jan2019. Just use a lower bound date as the minimum expected date, and the upper bound at least 29 days after the highest expected date.
data have;
input Cust_ID TransactionID POST_DATE :mmddyy10.;
format post_date date9.;
datalines;
123 321 2/28/2018
123 322 2/10/2018
123 323 2/2/2018
123 324 2/1/2018
123 325 11/27/2017
456 326 2/1/2018
987 327 3/5/2018
987 328 2/5/2018
987 329 1/5/2018
987 330 12/5/2017
987 331 11/6/2017
654 332 11/13/2017
654 333 2/28/2018
654 334 2/28/2018
654 335 2/26/2018
654 336 2/26/2018
654 337 2/23/2018
654 338 2/21/2018
654 339 2/21/2018
654 340 2/20/2018
654 341 2/20/2018
run;
proc sql;
create table want as
select *, case when
(select count(distinct TransactionID) from have
where Cust_ID=a.Cust_ID and POST_DATE between a.POST_DATE-29 and a.POST_DATE) > 3 then 1 else 0 end as FEE
from have as a;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.