## Summing by 2 other variables as part of a formula

Hi there,

New to these boards, relatively new to BASE SAS. I'm trying to figure out how to create a formula that checks if the sum of multiple transactions for a particular user are above a certain threshold (in this case than the total for the previous month), and flags if so. The logic is something like this:

flag_variable = IF((SUM(transaction_value) BY user_id AND month) > (SUM(transaction_value) BY user_id AND month-1) THEN 1 ELSE 0

So essentially I'm trying to check if transactions for a particular user last month are greater than this month, and write the variable value as 1 if so.

My data looks something like this:

TRANSACTION_ID,TRANSACTION_VALUE,USER,MONTH,FLAG_VARIABLE

123,\$30,001,5,0

124,\$150,002,6,1

125,\$25,001,6,0

126,\$99,002,5,0

127,\$55,001,5,0

128,\$10,002,5,0

So you can see I have multiple users conducting multiple transactions over multiple months. I want to sum the amount transacted per user per month, compare it to last month, and flag if it has increased. In this case, the flag_variable value for transaction_ID 124 would be 1, because total transaction value for user 002 has increased from month 5 to month 6.

I'm sure there is a simple way to do this but I'm new at this and haven't found anything online that does this specifically. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Summing by 2 other variables as part of a formula

To help you I need an example of lines  where the flags differ form expected, your expected flags on each line and

some explanation.

12 REPLIES 12

## Re: Summing by 2 other variables as part of a formula

Its not exactly clear what you want as final output. Do you want that flag on each transaction even though you've summed them up?

1. You have to sum by month and user id -> Proc means/summary/univariate or Proc SQL

2. Sort by UserID - month descending

3. Use LAG() function to compare to previous months values and determine flag

4. Merge back in with original data if desired.

## Re: Summing by 2 other variables as part of a formula

Hi Reeza,

Thanks for your reply. I want that flag on each transaction on the original dataset. So I don't need to keep those sums, I just need to calculate them to determine whether to flag or not. So the final result will not be a summary table or any sort of report, just an additional column on the original dataset with either 0 or 1 next to each transaction.

## Re: Summing by 2 other variables as part of a formula

Few steps to achieve your target:

1) Import data into sas dataset. Are your transactions already in a sas table ?

if not - what kind of file is it ? CSV / excel / other ?

2) Sum transactions value per user, each month in different variable (sum01, sum02, ...sum12).

How many months are in the data file ?

Do you have transactions of different years in same data file ?

If yes, you need a year beyond the month.

Thus can be done either by SQL or by base sas;

## Re: Summing by 2 other variables as part of a formula

Thanks, Shmuel. Replies to your questions:

1) Data comes from a csv file with many other fields.

2) Currently about 6 months, over 2 years (2015 and 2016)

I am indifferent to using base SAS or SQL, so whatever works best. Thanks!

## Re: Summing by 2 other variables as part of a formula

/* step 1 - importing the data into sas */

filename  trans CSV 'C:\  ...path to and file name... .CSV';

data trans;

infile trans dlm=',' truncover;

input TRANSACTION_ID

TRANSACTION_VALUE

USER

MONTH

YEAR

;

transaction_month = mdy(month, 01, year);  /* create a date variable */

run;

/* step 2 - summarizing */

proc sql;

create table tr_sum as select

user, transaction_month

sum(transaction_value) as tr_sum

from trans

group by user, transaction_month

order by user, transaction_month descending;

quit;

/* step 3 - calculating the flag */

data tr_flag;

set tr_sum;

by user transaction_month descending;

if first.user then flag=0; else

if tr_sum > lag(tr_sum) then flag = 1;

else flag=0;

run;

/* step 4 - merging the flag in the original dataset */

proc sql;

create table trans_new as select

t.*, s.flag

from trans as t

left join tr_sum as s

on t.user = s.user and

t.transaction_month = s.transaction_month

order by TRANSACTION_ID, USER,  year, MONTH;    /* adapt it to your needs */

quit;

## Re: Summing by 2 other variables as part of a formula

Thanks, this looks good, I think. Trying it out, and I'm getting some errors. Here is the log (note that the variable names are somewhat different, and that there is a lot of code before (including the data import) which I've left out). I think if I fix line 462 it may work. Note that there may be some inconsistencies there, I had to change a couple of variable names and such due to posting publicly.

Thanks!

===

data COMBINED_DATA;

455 set COMBINED_DATA;

456 temp_trans_date = mdy(trans_mo,01,trans_yr) ;

457 run;

NOTE: There were 43528 observations read from the data set WORK.COMBINED_DATA.

NOTE: The data set WORK.COMBINED_DATA has 43528 observations and 56 variables.

NOTE: DATA statement used (Total process time):

real time 0.07 seconds

cpu time 0.08 seconds

458

459 proc sql;

460 create table tr_sum as select

461 CH_ID, temp_trans_date

462 sum(debit_amt) as tr_sum

___

22

76

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN,

CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.

ERROR 76-322: Syntax error, statement will be ignored.

463 from combined_data

464 group by CH_ID, temp_trans_date

465 order by CH_ID, temp_trans_date descending;

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

466 quit;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SQL used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

467

468 data tr_flag;

469 set tr_sum;

ERROR: File WORK.TR_SUM.DATA does not exist.

470 by CH_ID trans_mo descending;

_

22

ERROR 22-322: Syntax error, expecting one of the following: a name, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_.

471 if first.CH_ID then flag1=0; else

472 if tr_sum > 1.5*(lag(tr_sum)) then flag1 = 1;

473 else flag1=0;

474 run;

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.TR_FLAG may be incomplete. When this step was stopped there were 0 observations and 2 variables.

WARNING: Data set WORK.TR_FLAG was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

475

476 proc sql;

477 create table trans_new as select

478 t.*, s.flag1

479 from COMBINED_DATA as t

480 left join tr_sum as s

481 on t.CH_ID = s.CH_ID and

482 t.trans_mo = s.trans_mo

483 order by trans_ID, CH_ID, trans_yr, trans_mo;

ERROR: File WORK.TR_SUM.DATA does not exist.

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

484 quit;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SQL used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

## Re: Summing by 2 other variables as part of a formula

Pay attemtion: variables list, while in SAS data step are seperated by space, while in SQL are seperated by comma.

Error on line 462 (in log) is becase you missed comma between "temp_trans_date" and sum(debit_amt) ...

On second thought, change sort type and drop the "descending" (on 2 lines);  (The default is "ascending");

Most of other errors seem to be the result of the first one.

Try it again and check, do you get expected results.

## Re: Summing by 2 other variables as part of a formula

Hello,

Still getting errors:

459 data COMBINED_DATA;

460 set COMBINED_DATA;

461 temp_trans_date = mdy(trans_mo,01,trans_yr) ;

462 run;

NOTE: There were 43528 observations read from the data set WORK.COMBINED_DATA.

NOTE: The data set WORK.COMBINED_DATA has 43528 observations and 56 variables.

NOTE: DATA statement used (Total process time):

real time 0.07 seconds

cpu time 0.07 seconds

463

464 proc sql;

465 create table tr_sum as select

466 CH_ID, temp_trans_date,

467 sum(debit_amt) as tr_sum

468 from combined_pcard_data

469 group by CH_ID, temp_trans_date

470 order by CH_ID, temp_trans_date ;

NOTE: Table WORK.TR_SUM created, with 3400 rows and 3 columns.

471 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 0.05 seconds

cpu time 0.11 seconds

472

473 data tr_flag;

474 set tr_sum;

475 by CH_ID trans_mo ;

476 if first.CH_ID then flag1=0; else

477 if tr_sum > 1.15*(lag(tr_sum)) then flag1 = 1;

478 else flag1=0;

479 run;

ERROR: BY variable trans_mo is not on input data set WORK.TR_SUM.

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.TR_FLAG may be incomplete. When this step was stopped there were 0 observations and 4 variables.

WARNING: Data set WORK.TR_FLAG was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

480

481 proc sql;

12 The SAS System 10:13 Tuesday, September 13, 2016

482 create table trans_new as select

483 t.*, s.flag1

484 from COMBINED_DATA as t

485 left join tr_sum as s

486 on t.CH_ID = s.CH_ID and

487 t.trans_mo = s.trans_mo

488 order by trans_ID, CH_ID, trans_yr, trans_mo;

ERROR: Column trans_mo could not be found in the table/view identified with the correlation name S.

ERROR: Column trans_mo could not be found in the table/view identified with the correlation name S.

ERROR: Column trans_mo could not be found in the table/view identified with the correlation name S.

ERROR: Column flag1 could not be found in the table/view identified with the correlation name S.

ERROR: Column flag1 could not be found in the table/view identified with the correlation name S.

ERROR: Expression using equals (=) has components that are of different data types.

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

489 quit;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SQL used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

## Re: Summing by 2 other variables as part of a formula

On 3rd step, after log line 479 you got a message:

ERROR: BY variable trans_mo is not on input data set WORK.TR_SUM.

The last dataset WORK.TR_SUM is created in the 2nd step - by SQL.

trans_mo is the month used to compute temp_trans_date together with the year.
This variable was not selected by SQL, but you can add it.

Do you realy want to ignore the year (trans_yr) in step 3? That can give you false results.

I think you should change (log line 475) either to:

BY CH_ID temp_trans_date   -  (mainly year and month as day=01 always)

or to:

BY CH_ID TRANS_YR TRANS_MO   -  (you will need to add trans_yr too to the SQL )

Check yourself before going on.

## Re: Summing by 2 other variables as part of a formula

Hi,

This worked! Well, mostly. No more errors in the code, but two things:

1. SAS is only flagging the first transaction in the offending month, not all the transactions. So if there are 20 transactions in the month which should be flagged for a particular user, only one of those is being flagged.
2. When I join the flag column back to trans_new, it doesn’t seem to take. Instead of getting 0s and 1s, it gives me only *s, which is what was in the original data.

Any modifications to the code to fix this? I appreciate all your help so far.

## Re: Summing by 2 other variables as part of a formula

To help you I need an example of lines  where the flags differ form expected, your expected flags on each line and

some explanation.

## Re: Summing by 2 other variables as part of a formula

Hello,

I tweaked it some and reformulated it, and it worked. Thanks for your help!

Discussion stats
• 12 replies
• 1193 views
• 0 likes
• 3 in conversation