DATA Step, Macro, Functions and more

Grouping Transaction Data

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

Grouping Transaction Data

Hello,

 

I would like to identify account transactions that have a blank transaction group (tran_group = ' ')
AND have a description (desc) the same as a description from a previous transaction that has been already
been classified as 'Income' (tran_group = Income).

 

The background is that a transaction would be classified as 'Income' if it meets certain criteria - ie regular and amount within 10% of last payment.  However some transactions will be missed such as bonuses and commissions. 


So I would like to identify these and classify them as tran_group = 'Income Non Base'.

 

To do this I need the sas code to look back through all the accounts transactions - which could be most likely upto 100 transactions prior to the current transaction and check if the same 'desc' has appeared before AND has also previously been flagged as 'Income' 
(tran_group = 'Income'), otherwise left blank.

 

Here is a temp dataset:

 

data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;

 

When this match occurs I would like the current transaction to be classified as 'Income Non Base' (tran_group = 'Income Non Base')

So in the infile example above, the following transactions in bold would then be classified as 'Income Non Base'.

 

S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG, Income Non Base
S1, 14JUN2011, 52, From KPMG, Income Non Base

T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining, Income Non Base
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense

U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors, Income Non Base
U1, 16OCT2011, 32, From ATA LAWYERS XYX, Income Non Base
U1, 18DEC2011, 41, From ATA LAWYERS XYX, Income Non Base

 

thanks

Sally


Accepted Solutions
Solution
‎09-05-2017 12:14 AM
Super User
Posts: 10,580

Re: Grouping Transaction Data

The subquery produces more than one result for a given where condition. You probably need to refine the b.date < a.date part so that it yields only the maximum date within that range.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

View solution in original post


All Replies
Valued Guide
Posts: 596

Re: Grouping Transaction Data

I think this does what you want except I believe acoount S1 tran_code 52 in your output is incorrect as it's earlier tran_group is "Living Expense:

 

data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;



proc sql;
	create table income
	as select *
	from temp
	where  tran_group="Income";
quit;

proc sql;
	update temp a
	set tran_group = 
	(select "Income Non Base" as tran_group
	from income b
	where b.account=a.account
		and b.tran_code = a.tran_code
		and b.date < a.date)
	where a.tran_group=""
	;
quit;
	

 

Trusted Advisor
Posts: 1,848

Re: Grouping Transaction Data

Posted in reply to ChrisBrooks

Next code was tested and results as desired:

data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;

data temp1 temp2;
 set temp;
     if tran_group = ' ' then output temp2;
     else output temp1;
run;

proc sql;
  create table temp3
  as select distinct a.account, a.date, a.tran_code, a.desc,
     'Income Non Base' as tran_group
  from temp2 as a 
  left join
       temp1 as b
     on a.account = b.account and
        a.date ge b.date and
        a.desc = b.desc;
quit;  

data want; set temp1 temp3; run;
proc sort data=want;
  by account date;
run;




 
 
 
 
 
   
Contributor
Posts: 22

Re: Grouping Transaction Data

Posted in reply to ChrisBrooks

Thanks This works on this temp datset, but when I try running it on the real data I get this error message?

 

'Salary' is a variable the same as 'desc' in the temp dataset.

 

thanks

Sally

 

OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
61
62 proc sql;
63 update tran_group_sorted a
64 set tran_group =(select "Income Non Base" as tran_group
65 from income b
66 where b.account=a.account and b.salary = a.salary and b.date < a.date)
67 where a.tran_group = " "
68 ;
ERROR: Subquery evaluated to more than one row.
NOTE: Correlation values are: account='100050S7' salary='DIRECT CREDIT From: CTRLINK PENSION Ref' date=20761 .
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
69 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 1.32 seconds
cpu time 0.55 seconds
 
70
71 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
84
Solution
‎09-05-2017 12:14 AM
Super User
Posts: 10,580

Re: Grouping Transaction Data

The subquery produces more than one result for a given where condition. You probably need to refine the b.date < a.date part so that it yields only the maximum date within that range.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 22

Re: PRXMATCH

Posted in reply to ChrisBrooks

 

Hello,

 

I want to identify credit card numbers  = 16 digits within a text string 'desc'.

 

I am using the following code which is identifying numbers equal to 16 digits, but it also idenitfing numbers greater than 16 digits.

 

prxmatch('/\d{16}/',desc) > 0

 

How do I identify numbers exactly 16 digits long?

 

thanks

Sally

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 207 views
  • 0 likes
  • 4 in conversation