Hello,
What is the easiest way to compare 30 variables (lag_debit1 to lag_debit30) against another 30 variables (lag_credit1 to lag_credit30) already created per account.
I only want the accounts which have an exact match between one or more.
thanks
Sally
Whilst you could do it with arrays, that wouldn't be optimal, especially if you get more rows. What you want to do is to merge on any rows (distinct to get just one per value) which = debit, then flag those:
data test; informat trandate ANYDTDTE9. debit 8.2 credit 8.2 ; input accountnum trandate debit credit count ; format trandate DATE9.; datalines; 100001 30APR2018 2000.00 . 1 100002 23MAY2018 3000.00 . 1 100002 26MAY2018 2500.50 . 2 100002 27MAY2018 . 2100.00 3 100002 28MAY2018 2100.00 . 4 100002 29MAY2018 2500.10 . 5 100003 23MAY2018 2450.10 . 1 100003 24MAY2018 . 2450.10 2 ; run; proc sql; create table want as select a.*, case when b.credit ne . then "Y" else "" end as flag from test a left join (select distinct credit from test) b on a.debit=b.credit order by accountnum,count; quit;
Maybe the only things required is a loop, but because you forgot to post input data, the expected result and an explanation of what "compare" means, it is nearly impossible to suggest something solving your issue.
Post test data in the form of a datastep, otherwise we are just guessing. If you have a list of variables with a numeric suffix then you will be talking about arrays or lists of values, you can define as:
data want; set have; array lag_debit(30); array lag_credit(30); do i=1 to 30; ... end; run; /* Or use a list directly as */ data want; set have; low=min(of lag_debit:); run;
You can also remodel the data so you don't have a wide table, that would likely be far easier to manage and work with.
Hi,
I need a bit more help. If I use the code suggested, it returns 100% match, however this is not true. What do I need to change?
thks
data want;
set rule7;
array lag_debit(30);
array lag_credit(30);
do i=1 to 30;
if lag_debit(30) = lag_credit(30) then match=1;
end;
run;
Your code compares the 30th variables 30 times. Using "i" instead of "30" in the if-statement should help.
do i=1 to 30;
if lag_debit(i) = lag_credit(i) then match=1;
end;
Thanks but it looks like its incorrectly matching missing lag_debits to missing lag_credits. How can i specify that I only want matches when there is a value present?
Add a check to the condition for missing. Compare with . (representation of a missing numeric value), or use the missing() function.
Again, seeing test data in a datastep really helps us understand your question, and showing what you want out.
data want; set rule7; array lag_debit{30}; array lag_credit{30};
array results{30}; do i=1 to 30; results{i}=ifn(lag_debit{i}=lag_credit{i},1,0); end; run;
You will note (use of code window, its the {i} above post area), instead of putting 30 in the if part, i use the inceremental variable i, so each element is checked rather than just element 30. Also, if you want a result for each pairing you need a result for each pairing, hence another array, otherwise you will get 1 if any of lag_debit{i} matches the same element in credit.
Just to add, I find using curly braces around array's makes it easier to spot them amongst function calls, which is why I change them in the code.
Thanks I understand that the input data would help, just haven't have time as yet.
It looks like the code is matching all the missing values and thinking that its a match, when really its just that there is no value (blank).
None of us have time, we are only volunteering on here - we don't work for SAS. As such we have had this back and forth wasting time, where test data up front would have been quicker.
data want; set rule7; array lag_debit{30}; array lag_credit{30};
array results{30}; do i=1 to 30; results{i}=ifn(lag_debit{i}=lag_credit{i} and lag_debit{i} ne .,1,0); end; run;
@Selli5 wrote:
Thanks I understand that the input data would help, just haven't have time as yet.
Then your issue can't be urgent at all. Otherwise, you would make sure to help us as much as you can in helping you.
Hello,
I have attached some input data and code below.
I would like to identify when there is an exact match between any lag_debits to any lag_credit (and exclude blanks).
So in this case account number 10002 should produce a match on the $2100 amount and account number 10003 should produce a match on $2450.10.
thanks
So this has nothing to do with arrays at all then. Ok, we see here why some test data, and your missing what the output should look like, does to help illustrate - note how I post code in a code window in the post:
data test; informat trandate ANYDTDTE9. debit 8.2 credit 8.2 ; input accountnum trandate debit credit count ; format trandate DATE9.; datalines; 100001 30APR2018 2000.00 . 1 100002 23MAY2018 3000.00 . 1 100002 26MAY2018 2500.50 . 2 100002 27MAY2018 . 2100.00 3 100002 28MAY2018 2100.00 . 4 100002 29MAY2018 2500.10 . 5 100003 23MAY2018 2450.10 . 1 100003 24MAY2018 . 2450.10 2 ; run;
So basically to flag the row if previous is like it then:
data want; set test; debit_chk=ifn(lag(debit)=debit,1,0); credit_chk=ifn(lag(credit)=credit,1,0); run;
Thanks, but that code would only allow comparing to the previous row of data.
I was using the arrays as I am trying to identify any matching debits (lag_debit1 to lag_debit5) to any credits (lag_credit1 to lag_credit5).
In the final dataset (result) Account 10002 should produce a match for 2100 Amount (lag_debit4 = lag_credit3) and Account 10003 should produce a match for 2450.10 (lag_debit1 to lag_credit2). However the results dataset is currently not identifying these matches and returns 0 for results 1 to results 5.
data test;
informat trandate ANYDTDTE9. debit 8.2 credit 8.2 ;
input accountnum trandate debit credit count ;
format trandate DATE9.;
datalines;
100001 30APR2018 2000.00 . 1
100002 23MAY2018 3000.00 . 1
100002 26MAY2018 2500.50 . 2
100002 27MAY2018 . 2100.00 3
100002 28MAY2018 2100.00 . 4
100002 29MAY2018 2500.10 . 5
100003 23MAY2018 2450.10 . 1
100003 24MAY2018 . 2450.10 2
;
run;
data want;
format lag_debit1-lag_debit5 8.2 lag_credit1-lag_credit5 8.2;
do until(last.accountnum);
set test;
by accountnum trandate;
array lag_debit{5};
array lag_credit{5};
if first.accountnum then call missing(of lag_debit(*));
if first.accountnum then call missing(of lag_credit(*));
lag_debit(count)=debit;
lag_credit(count)=credit;
output;
end;
run;
data result;
set want;
array lag_debit{5};
array lag_credit{5};
array results{5};
do i=1 to 5;
results{i}= ifn(lag_debit{i}=lag_credit{i} and lag_debit{i} ne ., 1, 0);
end;
run;
proc print data=result;
var accountnum lag_debit1-lag_debit5 lag_credit1-lag_credit5 results1 - results5 i ;
run;
Whilst you could do it with arrays, that wouldn't be optimal, especially if you get more rows. What you want to do is to merge on any rows (distinct to get just one per value) which = debit, then flag those:
data test; informat trandate ANYDTDTE9. debit 8.2 credit 8.2 ; input accountnum trandate debit credit count ; format trandate DATE9.; datalines; 100001 30APR2018 2000.00 . 1 100002 23MAY2018 3000.00 . 1 100002 26MAY2018 2500.50 . 2 100002 27MAY2018 . 2100.00 3 100002 28MAY2018 2100.00 . 4 100002 29MAY2018 2500.10 . 5 100003 23MAY2018 2450.10 . 1 100003 24MAY2018 . 2450.10 2 ; run; proc sql; create table want as select a.*, case when b.credit ne . then "Y" else "" end as flag from test a left join (select distinct credit from test) b on a.debit=b.credit order by accountnum,count; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.