Hi there,
New to these boards, relatively new to BASE SAS. I'm trying to figure out how to create a formula that checks if the sum of multiple transactions for a particular user are above a certain threshold (in this case than the total for the previous month), and flags if so. The logic is something like this:
flag_variable = IF((SUM(transaction_value) BY user_id AND month) > (SUM(transaction_value) BY user_id AND month-1) THEN 1 ELSE 0
So essentially I'm trying to check if transactions for a particular user last month are greater than this month, and write the variable value as 1 if so.
My data looks something like this:
TRANSACTION_ID,TRANSACTION_VALUE,USER,MONTH,FLAG_VARIABLE
123,$30,001,5,0
124,$150,002,6,1
125,$25,001,6,0
126,$99,002,5,0
127,$55,001,5,0
128,$10,002,5,0
So you can see I have multiple users conducting multiple transactions over multiple months. I want to sum the amount transacted per user per month, compare it to last month, and flag if it has increased. In this case, the flag_variable value for transaction_ID 124 would be 1, because total transaction value for user 002 has increased from month 5 to month 6.
I'm sure there is a simple way to do this but I'm new at this and haven't found anything online that does this specifically. Thanks!
To help you I need an example of lines where the flags differ form expected, your expected flags on each line and
some explanation.
Its not exactly clear what you want as final output. Do you want that flag on each transaction even though you've summed them up?
1. You have to sum by month and user id -> Proc means/summary/univariate or Proc SQL
2. Sort by UserID - month descending
3. Use LAG() function to compare to previous months values and determine flag
4. Merge back in with original data if desired.
Hi Reeza,
Thanks for your reply. I want that flag on each transaction on the original dataset. So I don't need to keep those sums, I just need to calculate them to determine whether to flag or not. So the final result will not be a summary table or any sort of report, just an additional column on the original dataset with either 0 or 1 next to each transaction.
Few steps to achieve your target:
1) Import data into sas dataset. Are your transactions already in a sas table ?
if not - what kind of file is it ? CSV / excel / other ?
2) Sum transactions value per user, each month in different variable (sum01, sum02, ...sum12).
How many months are in the data file ?
Do you have transactions of different years in same data file ?
If yes, you need a year beyond the month.
Thus can be done either by SQL or by base sas;
Th help you we need your answers to the above questions.
Thanks, Shmuel. Replies to your questions:
1) Data comes from a csv file with many other fields.
2) Currently about 6 months, over 2 years (2015 and 2016)
I am indifferent to using base SAS or SQL, so whatever works best. Thanks!
/* step 1 - importing the data into sas */
filename trans CSV 'C:\ ...path to and file name... .CSV';
data trans;
infile trans dlm=',' truncover;
input TRANSACTION_ID
TRANSACTION_VALUE
USER
MONTH
YEAR
;
transaction_month = mdy(month, 01, year); /* create a date variable */
run;
/* step 2 - summarizing */
proc sql;
create table tr_sum as select
user, transaction_month
sum(transaction_value) as tr_sum
from trans
group by user, transaction_month
order by user, transaction_month descending;
quit;
/* step 3 - calculating the flag */
data tr_flag;
set tr_sum;
by user transaction_month descending;
if first.user then flag=0; else
if tr_sum > lag(tr_sum) then flag = 1;
else flag=0;
run;
/* step 4 - merging the flag in the original dataset */
proc sql;
create table trans_new as select
t.*, s.flag
from trans as t
left join tr_sum as s
on t.user = s.user and
t.transaction_month = s.transaction_month
order by TRANSACTION_ID, USER, year, MONTH; /* adapt it to your needs */
quit;
Thanks, this looks good, I think. Trying it out, and I'm getting some errors. Here is the log (note that the variable names are somewhat different, and that there is a lot of code before (including the data import) which I've left out). I think if I fix line 462 it may work. Note that there may be some inconsistencies there, I had to change a couple of variable names and such due to posting publicly.
Thanks!
===
data COMBINED_DATA;
455 set COMBINED_DATA;
456 temp_trans_date = mdy(trans_mo,01,trans_yr) ;
457 run;
NOTE: There were 43528 observations read from the data set WORK.COMBINED_DATA.
NOTE: The data set WORK.COMBINED_DATA has 43528 observations and 56 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.08 seconds
458
459 proc sql;
460 create table tr_sum as select
461 CH_ID, temp_trans_date
462 sum(debit_amt) as tr_sum
___
22
76
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN,
CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
ERROR 76-322: Syntax error, statement will be ignored.
463 from combined_data
464 group by CH_ID, temp_trans_date
465 order by CH_ID, temp_trans_date descending;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
466 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
467
468 data tr_flag;
469 set tr_sum;
ERROR: File WORK.TR_SUM.DATA does not exist.
470 by CH_ID trans_mo descending;
_
22
ERROR 22-322: Syntax error, expecting one of the following: a name, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_.
471 if first.CH_ID then flag1=0; else
472 if tr_sum > 1.5*(lag(tr_sum)) then flag1 = 1;
473 else flag1=0;
474 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TR_FLAG may be incomplete. When this step was stopped there were 0 observations and 2 variables.
WARNING: Data set WORK.TR_FLAG was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
475
476 proc sql;
477 create table trans_new as select
478 t.*, s.flag1
479 from COMBINED_DATA as t
480 left join tr_sum as s
481 on t.CH_ID = s.CH_ID and
482 t.trans_mo = s.trans_mo
483 order by trans_ID, CH_ID, trans_yr, trans_mo;
ERROR: File WORK.TR_SUM.DATA does not exist.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
483 ! /* adapt it to your needs */
484 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Pay attemtion: variables list, while in SAS data step are seperated by space, while in SQL are seperated by comma.
Error on line 462 (in log) is becase you missed comma between "temp_trans_date" and sum(debit_amt) ...
On second thought, change sort type and drop the "descending" (on 2 lines); (The default is "ascending");
Most of other errors seem to be the result of the first one.
Try it again and check, do you get expected results.
Hello,
Still getting errors:
459 data COMBINED_DATA;
460 set COMBINED_DATA;
461 temp_trans_date = mdy(trans_mo,01,trans_yr) ;
462 run;
NOTE: There were 43528 observations read from the data set WORK.COMBINED_DATA.
NOTE: The data set WORK.COMBINED_DATA has 43528 observations and 56 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.07 seconds
463
464 proc sql;
465 create table tr_sum as select
466 CH_ID, temp_trans_date,
467 sum(debit_amt) as tr_sum
468 from combined_pcard_data
469 group by CH_ID, temp_trans_date
470 order by CH_ID, temp_trans_date ;
NOTE: Table WORK.TR_SUM created, with 3400 rows and 3 columns.
471 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.05 seconds
cpu time 0.11 seconds
472
473 data tr_flag;
474 set tr_sum;
475 by CH_ID trans_mo ;
476 if first.CH_ID then flag1=0; else
477 if tr_sum > 1.15*(lag(tr_sum)) then flag1 = 1;
478 else flag1=0;
479 run;
ERROR: BY variable trans_mo is not on input data set WORK.TR_SUM.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TR_FLAG may be incomplete. When this step was stopped there were 0 observations and 4 variables.
WARNING: Data set WORK.TR_FLAG was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
480
481 proc sql;
12 The SAS System 10:13 Tuesday, September 13, 2016
482 create table trans_new as select
483 t.*, s.flag1
484 from COMBINED_DATA as t
485 left join tr_sum as s
486 on t.CH_ID = s.CH_ID and
487 t.trans_mo = s.trans_mo
488 order by trans_ID, CH_ID, trans_yr, trans_mo;
ERROR: Column trans_mo could not be found in the table/view identified with the correlation name S.
ERROR: Column trans_mo could not be found in the table/view identified with the correlation name S.
ERROR: Column trans_mo could not be found in the table/view identified with the correlation name S.
ERROR: Column flag1 could not be found in the table/view identified with the correlation name S.
ERROR: Column flag1 could not be found in the table/view identified with the correlation name S.
ERROR: Expression using equals (=) has components that are of different data types.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
488 ! /* adapt it to your needs */
489 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
On 3rd step, after log line 479 you got a message:
ERROR: BY variable trans_mo is not on input data set WORK.TR_SUM.
The last dataset WORK.TR_SUM is created in the 2nd step - by SQL.
trans_mo is the month used to compute temp_trans_date together with the year.
This variable was not selected by SQL, but you can add it.
Do you realy want to ignore the year (trans_yr) in step 3? That can give you false results.
I think you should change (log line 475) either to:
BY CH_ID temp_trans_date - (mainly year and month as day=01 always)
or to:
BY CH_ID TRANS_YR TRANS_MO - (you will need to add trans_yr too to the SQL )
Check yourself before going on.
Hi,
This worked! Well, mostly. No more errors in the code, but two things:
Any modifications to the code to fix this? I appreciate all your help so far.
To help you I need an example of lines where the flags differ form expected, your expected flags on each line and
some explanation.
Hello,
I tweaked it some and reformulated it, and it worked. Thanks for your help!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.