Solved: Re: Grouping Transaction Data

Selli5 · Posted 08-18-2017 03:15 AM

Hello,

I would like to identify account transactions that have a blank transaction group (tran_group = ' ')
AND have a description (desc) the same as a description from a previous transaction that has been already
been classified as 'Income' (tran_group = Income).

The background is that a transaction would be classified as 'Income' if it meets certain criteria - ie regular and amount within 10% of last payment. However some transactions will be missed such as bonuses and commissions.

So I would like to identify these and classify them as tran_group = 'Income Non Base'.

To do this I need the sas code to look back through all the accounts transactions - which could be most likely upto 100 transactions prior to the current transaction and check if the same 'desc' has appeared before AND has also previously been flagged as 'Income'
(tran_group = 'Income'), otherwise left blank.

Here is a temp dataset:

data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;

When this match occurs I would like the current transaction to be classified as 'Income Non Base' (tran_group = 'Income Non Base')

So in the infile example above, the following transactions in bold would then be classified as 'Income Non Base'.

S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG, Income Non Base
S1, 14JUN2011, 52, From KPMG, Income Non Base

T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining, Income Non Base
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense

U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors, Income Non Base
U1, 16OCT2011, 32, From ATA LAWYERS XYX, Income Non Base
U1, 18DEC2011, 41, From ATA LAWYERS XYX, Income Non Base

thanks

Sally

Kurt_Bremser · Posted 08-18-2017 06:40 AM

The subquery produces more than one result for a given where condition. You probably need to refine the b.date < a.date part so that it yields only the maximum date within that range.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

ChrisBrooks · Posted 08-18-2017 04:10 AM

I think this does what you want except I believe acoount S1 tran_code 52 in your output is incorrect as it's earlier tran_group is "Living Expense:

data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;



proc sql;
	create table income
	as select *
	from temp
	where  tran_group="Income";
quit;

proc sql;
	update temp a
	set tran_group = 
	(select "Income Non Base" as tran_group
	from income b
	where b.account=a.account
		and b.tran_code = a.tran_code
		and b.date < a.date)
	where a.tran_group=""
	;
quit;

Shmuel · Posted 08-18-2017 04:37 AM

Next code was tested and results as desired:

data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;

data temp1 temp2;
 set temp;
     if tran_group = ' ' then output temp2;
     else output temp1;
run;

proc sql;
  create table temp3
  as select distinct a.account, a.date, a.tran_code, a.desc,
     'Income Non Base' as tran_group
  from temp2 as a 
  left join
       temp1 as b
     on a.account = b.account and
        a.date ge b.date and
        a.desc = b.desc;
quit;  

data want; set temp1 temp3; run;
proc sort data=want;
  by account date;
run;

Selli5 · Posted 08-18-2017 05:17 AM

Thanks This works on this temp datset, but when I try running it on the real data I get this error message?

'Salary' is a variable the same as 'desc' in the temp dataset.

thanks

Sally

OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

61

62 proc sql;

63 update tran_group_sorted a

64 set tran_group =(select "Income Non Base" as tran_group

65 from income b

66 where b.account=a.account and b.salary = a.salary and b.date < a.date)

67 where a.tran_group = " "

68 ;

ERROR: Subquery evaluated to more than one row.

NOTE: Correlation values are: account='100050S7' salary='DIRECT CREDIT From: CTRLINK PENSION Ref' date=20761 .

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

69 quit;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SQL used (Total process time):

real time 1.32 seconds

cpu time 0.55 seconds

70

71 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

84

Kurt_Bremser · Posted 08-18-2017 06:40 AM

The subquery produces more than one result for a given where condition. You probably need to refine the b.date < a.date part so that it yields only the maximum date within that range.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Selli5 · Posted 09-04-2017 10:01 PM

Hello,

I want to identify credit card numbers = 16 digits within a text string 'desc'.

I am using the following code which is identifying numbers equal to 16 digits, but it also idenitfing numbers greater than 16 digits.

prxmatch('/\d{16}/',desc) > 0

How do I identify numbers exactly 16 digits long?

thanks

Sally

Grouping Transaction Data

Re: Grouping Transaction Data

Re: Grouping Transaction Data

Re: Grouping Transaction Data

Re: Grouping Transaction Data

Re: Grouping Transaction Data

Re: PRXMATCH

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away