BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DiegoDiaz
Calcite | Level 5

Hi All-

 

I am, going to start with a little background.  I do have a dataset like the following :

 

Cust_IDTransactionIDPOST_DATE FEE 
1233212/28/2018 
1233222/10/2018 
1233232/2/2018 
1233242/1/2018 
12332511/27/2017 
4563262/1/2018 
9873273/5/2018 
9873282/5/2018 
9873291/5/2018 
98733012/5/2017 
98733111/6/2017 
65433211/13/2017 
6543332/28/2018 
6543342/28/2018 
6543352/26/2018 
6543362/26/2018 
6543372/23/2018 
6543382/21/2018 
6543392/21/2018 
6543402/20/2018 
6543412/20/2018 

 

 

I would like to loop through this dataset and update the Flag column with a 1 depending on information obtained in the following condition by cust_id:

 

-No fee for up to three transactions per rolling 30-day period.

 

This would update the dataset to be the following :

 

Cust_IDTransactionIDPOST_DATE FEE 
1233212/28/20181
1233222/10/20180
1233232/2/20180
1233242/1/20180
12332511/27/20170
4563262/1/20180
9873273/5/20180
9873282/5/20180
9873291/5/20180
98733012/5/20170
98733111/6/20170
65433211/13/20170
6543332/28/20181
6543342/28/20181
6543352/26/20181
6543362/26/20181
6543372/23/20181
6543382/21/20181
6543392/21/20180
6543402/20/20180
6543412/20/20180

 

 

I have been trying to retain the current record value but i have not been able to do it. I have got the flag only validation the current record with the next record. There's tons of documentation online that sort of applies, but nothing specific enough and try as I might, I can't seem to work it out.  I've been tempted to export the data to a file, parse it using VB and then reimport, but that is horribly backwards and inefficient. Plus, I won't learn anything about SAS.

 

Any help would be greatly appreciated.

 

Thanks,

 

Diego

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If, for any post_date, there are more that 3 transactions between post_date-29 and post_date (inclusive) then fee=1.  BUT, while your data are grouped by cust_id, they are not sorted by date within cust_id.  So, to avoid sorting, you can use a double dow.  I.e. have 2 do loops reading each cust_id.  The first loop increments a subset of a data array (every date from post_date through post_date+29).  The second loop checks the result for any given post_date, looking for values over 3:

 

data have;
  input Cust_ID TransactionID POST_DATE :mmddyy10.;
  format post_date date9.;
  *if cust_id=987;
datalines;
123 321 2/28/2018   
123 322 2/10/2018   
123 323 2/2/2018   
123 324 2/1/2018   
123 325 11/27/2017   
456 326 2/1/2018   
987 327 3/5/2018   
987 328 2/5/2018   
987 329 1/5/2018   
987 330 12/5/2017   
987 331 11/6/2017   
654 332 11/13/2017   
654 333 2/28/2018   
654 334 2/28/2018   
654 335 2/26/2018   
654 336 2/26/2018   
654 337 2/23/2018   
654 338 2/21/2018   
654 339 2/21/2018   
654 340 2/20/2018   
654 341 2/20/2018   
run;

%let lower_bound=%sysfunc(inputn(01jan2017,date9.));
%let upper_bound=%sysfunc(inputn(31jan2019,date9.));

data want (drop=_:);
  array _count{&lower_bound:&upper_bound} ;
  do until (last.cust_id);
    set have;
    by cust_id notsorted;
    do _d=post_date to post_date+29;
      _count{_d}=sum(_count{_d},1);
    end;
  end;
  do until (last.cust_id);
    set have;
    by cust_id notsorted;
    fee=(_count{post_date}>3);
    output;
  end;
run;

 

Notice the _count array is index by date.  And the lower bound of the index is set to the numerical equivalent of sas data value 01jan2017, and the upper bound to the equivalent of 31jan2019.  Just use a lower bound date as the minimum expected date, and the upper bound at least 29 days after the highest expected date.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

So you want FEE=0 as a starting point, but if three transactions happen for the same cust_id, you want the fourth to have FEE=1, correct?

DiegoDiaz
Calcite | Level 5

Yes, The starting to point should be 0, but the fourth or the next ones should be 1 if the were posted per rolling 30-day period

mkeintz
PROC Star

If occurs to me that you expected transactions to be sorted in descending order within a given cust_id.

 

If so, then presumably transaction 332 (first trans for id 654) is in error.  It's the only transaction that doesn't follow the descending date rule.  If that was unintentional, then the solution is a very simple self-merge data step:

 

data want2;
  merge have
        have (firstobs=4 keep=cust_id post_date
              rename=(cust_id=_id4 post_date=_pd4));
  if cust_id=_id4 and (post_date< _pd4+29) then fee=1;
  else fee=0;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
DiegoDiaz
Calcite | Level 5

This is awesome!

 

I really appreciate it

 

Many thanks!

mkeintz
PROC Star

If, for any post_date, there are more that 3 transactions between post_date-29 and post_date (inclusive) then fee=1.  BUT, while your data are grouped by cust_id, they are not sorted by date within cust_id.  So, to avoid sorting, you can use a double dow.  I.e. have 2 do loops reading each cust_id.  The first loop increments a subset of a data array (every date from post_date through post_date+29).  The second loop checks the result for any given post_date, looking for values over 3:

 

data have;
  input Cust_ID TransactionID POST_DATE :mmddyy10.;
  format post_date date9.;
  *if cust_id=987;
datalines;
123 321 2/28/2018   
123 322 2/10/2018   
123 323 2/2/2018   
123 324 2/1/2018   
123 325 11/27/2017   
456 326 2/1/2018   
987 327 3/5/2018   
987 328 2/5/2018   
987 329 1/5/2018   
987 330 12/5/2017   
987 331 11/6/2017   
654 332 11/13/2017   
654 333 2/28/2018   
654 334 2/28/2018   
654 335 2/26/2018   
654 336 2/26/2018   
654 337 2/23/2018   
654 338 2/21/2018   
654 339 2/21/2018   
654 340 2/20/2018   
654 341 2/20/2018   
run;

%let lower_bound=%sysfunc(inputn(01jan2017,date9.));
%let upper_bound=%sysfunc(inputn(31jan2019,date9.));

data want (drop=_:);
  array _count{&lower_bound:&upper_bound} ;
  do until (last.cust_id);
    set have;
    by cust_id notsorted;
    do _d=post_date to post_date+29;
      _count{_d}=sum(_count{_d},1);
    end;
  end;
  do until (last.cust_id);
    set have;
    by cust_id notsorted;
    fee=(_count{post_date}>3);
    output;
  end;
run;

 

Notice the _count array is index by date.  And the lower bound of the index is set to the numerical equivalent of sas data value 01jan2017, and the upper bound to the equivalent of 31jan2019.  Just use a lower bound date as the minimum expected date, and the upper bound at least 29 days after the highest expected date.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
  input Cust_ID TransactionID POST_DATE :mmddyy10.;
  format post_date date9.;
datalines;
123 321 2/28/2018   
123 322 2/10/2018   
123 323 2/2/2018   
123 324 2/1/2018   
123 325 11/27/2017   
456 326 2/1/2018   
987 327 3/5/2018   
987 328 2/5/2018   
987 329 1/5/2018   
987 330 12/5/2017   
987 331 11/6/2017   
654 332 11/13/2017   
654 333 2/28/2018   
654 334 2/28/2018   
654 335 2/26/2018   
654 336 2/26/2018   
654 337 2/23/2018   
654 338 2/21/2018   
654 339 2/21/2018   
654 340 2/20/2018   
654 341 2/20/2018   
run;
proc sql;
create table want as
 select *, case when
(select count(distinct TransactionID) from have 
 where Cust_ID=a.Cust_ID and POST_DATE between a.POST_DATE-29 and a.POST_DATE) > 3 then 1 else 0 end as FEE
  from have as a;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 865 views
  • 0 likes
  • 4 in conversation