BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Selli5
Fluorite | Level 6

Hello,

 

I would like to identify account transactions that have a blank transaction group (tran_group = ' ')
AND have a description (desc) the same as a description from a previous transaction that has been already
been classified as 'Income' (tran_group = Income).

 

The background is that a transaction would be classified as 'Income' if it meets certain criteria - ie regular and amount within 10% of last payment.  However some transactions will be missed such as bonuses and commissions. 


So I would like to identify these and classify them as tran_group = 'Income Non Base'.

 

To do this I need the sas code to look back through all the accounts transactions - which could be most likely upto 100 transactions prior to the current transaction and check if the same 'desc' has appeared before AND has also previously been flagged as 'Income' 
(tran_group = 'Income'), otherwise left blank.

 

Here is a temp dataset:

 

data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;

 

When this match occurs I would like the current transaction to be classified as 'Income Non Base' (tran_group = 'Income Non Base')

So in the infile example above, the following transactions in bold would then be classified as 'Income Non Base'.

 

S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG, Income Non Base
S1, 14JUN2011, 52, From KPMG, Income Non Base

T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining, Income Non Base
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense

U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors, Income Non Base
U1, 16OCT2011, 32, From ATA LAWYERS XYX, Income Non Base
U1, 18DEC2011, 41, From ATA LAWYERS XYX, Income Non Base

 

thanks

Sally

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

The subquery produces more than one result for a given where condition. You probably need to refine the b.date < a.date part so that it yields only the maximum date within that range.

View solution in original post

5 REPLIES 5
ChrisBrooks
Ammonite | Level 13

I think this does what you want except I believe acoount S1 tran_code 52 in your output is incorrect as it's earlier tran_group is "Living Expense:

 

data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;



proc sql;
	create table income
	as select *
	from temp
	where  tran_group="Income";
quit;

proc sql;
	update temp a
	set tran_group = 
	(select "Income Non Base" as tran_group
	from income b
	where b.account=a.account
		and b.tran_code = a.tran_code
		and b.date < a.date)
	where a.tran_group=""
	;
quit;
	

 

Shmuel
Garnet | Level 18

Next code was tested and results as desired:

data temp;
infile datalines dlm=',' truncover;
informat account $2. date DATE9. tran_code $2. desc $20. tran_group $20.;
input account date tran_code desc tran_group;
format date DATE9.;
datalines;
S1, 12MAY2010, 47, From KPMG, Income
S1, 10SEP2010, 52, Expense PPWC, Living Expense
S1, 16APR2011, 47, From KPMG,
S1, 14JUN2011, 52, From KPMG,
T1, 11MAR2014, 30, From Forts Mining, Income
T1, 11MAY2014, 30, From Forts Mining,
T1, 12SEP2017, 41, From ABA Accountants, Income
T1, 10JUN2018, 31, From ATO XYX, Tax Expense
U1, 14FEB2011, 32, From YYY Surveyors, Income
U1, 20MAY2011, 41, From ATA LAWYERS XYX, Income,
U1, 14JUN2011, 32, From YYY Surveyors,
U1, 16OCT2011, 32, From ATA LAWYERS XYX,
U1, 18DEC2011, 41, From ATA LAWYERS XYX,
;
run;

data temp1 temp2;
 set temp;
     if tran_group = ' ' then output temp2;
     else output temp1;
run;

proc sql;
  create table temp3
  as select distinct a.account, a.date, a.tran_code, a.desc,
     'Income Non Base' as tran_group
  from temp2 as a 
  left join
       temp1 as b
     on a.account = b.account and
        a.date ge b.date and
        a.desc = b.desc;
quit;  

data want; set temp1 temp3; run;
proc sort data=want;
  by account date;
run;




 
 
 
 
 
   
Selli5
Fluorite | Level 6

Thanks This works on this temp datset, but when I try running it on the real data I get this error message?

 

'Salary' is a variable the same as 'desc' in the temp dataset.

 

thanks

Sally

 

OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
61
62 proc sql;
63 update tran_group_sorted a
64 set tran_group =(select "Income Non Base" as tran_group
65 from income b
66 where b.account=a.account and b.salary = a.salary and b.date < a.date)
67 where a.tran_group = " "
68 ;
ERROR: Subquery evaluated to more than one row.
NOTE: Correlation values are: account='100050S7' salary='DIRECT CREDIT From: CTRLINK PENSION Ref' date=20761 .
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
69 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 1.32 seconds
cpu time 0.55 seconds
 
70
71 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
84
Kurt_Bremser
Super User

The subquery produces more than one result for a given where condition. You probably need to refine the b.date < a.date part so that it yields only the maximum date within that range.

Selli5
Fluorite | Level 6

 

Hello,

 

I want to identify credit card numbers  = 16 digits within a text string 'desc'.

 

I am using the following code which is identifying numbers equal to 16 digits, but it also idenitfing numbers greater than 16 digits.

 

prxmatch('/\d{16}/',desc) > 0

 

How do I identify numbers exactly 16 digits long?

 

thanks

Sally

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1916 views
  • 0 likes
  • 4 in conversation