Hello,
I would like to identify account transactions that have a blank transaction group (tran_group = ' ')
AND have a description (desc) the same as a description from a previous transaction that has been already
been classified as 'Income' (tran_group = Income).
The background is that a transaction would be classified as 'Income' if it meets certain criteria - ie regular and amount within 10% of last payment. However some transactions will be missed such as bonuses and commissions.
So I would like to identify these and classify them as tran_group = 'Income Non Base'.
To do this I need the sas code to look back through all the accounts transactions - which could be most likely upto 100 transactions prior to the current transaction and check if the same 'desc' has appeared before AND has also previously been flagged as 'Income'
(tran_group = 'Income'), otherwise left blank.
Here is a temp dataset:
data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;
When this match occurs I would like the current transaction to be classified as 'Income Non Base' (tran_group = 'Income Non Base')
So in the infile example above, the following transactions in bold would then be classified as 'Income Non Base'.
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG, Income Non Base
S1, 14JUN2011, 52, From KPMG, Income Non Base
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining, Income Non Base
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors, Income Non Base
U1, 16OCT2011, 32, From ATA LAWYERS XYX, Income Non Base
U1, 18DEC2011, 41, From ATA LAWYERS XYX, Income Non Base
thanks
Sally
The subquery produces more than one result for a given where condition. You probably need to refine the b.date < a.date part so that it yields only the maximum date within that range.
I think this does what you want except I believe acoount S1 tran_code 52 in your output is incorrect as it's earlier tran_group is "Living Expense:
data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;
proc sql;
create table income
as select *
from temp
where tran_group="Income";
quit;
proc sql;
update temp a
set tran_group =
(select "Income Non Base" as tran_group
from income b
where b.account=a.account
and b.tran_code = a.tran_code
and b.date < a.date)
where a.tran_group=""
;
quit;
Next code was tested and results as desired:
data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;
data temp1 temp2;
set temp;
if tran_group = ' ' then output temp2;
else output temp1;
run;
proc sql;
create table temp3
as select distinct a.account, a.date, a.tran_code, a.desc,
'Income Non Base' as tran_group
from temp2 as a
left join
temp1 as b
on a.account = b.account and
a.date ge b.date and
a.desc = b.desc;
quit;
data want; set temp1 temp3; run;
proc sort data=want;
by account date;
run;
Thanks This works on this temp datset, but when I try running it on the real data I get this error message?
'Salary' is a variable the same as 'desc' in the temp dataset.
thanks
Sally
The subquery produces more than one result for a given where condition. You probably need to refine the b.date < a.date part so that it yields only the maximum date within that range.
Hello,
I want to identify credit card numbers = 16 digits within a text string 'desc'.
I am using the following code which is identifying numbers equal to 16 digits, but it also idenitfing numbers greater than 16 digits.
prxmatch('/\d{16}/',desc) > 0
How do I identify numbers exactly 16 digits long?
thanks
Sally
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.