BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Selli5
Fluorite | Level 6

 

Hello,

 

What is the easiest way to compare 30 variables (lag_debit1 to lag_debit30) against another 30 variables (lag_credit1 to lag_credit30) already created per account.

 

I only want the accounts which have an exact match between one or more.

 

thanks

Sally

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Whilst you could do it with arrays, that wouldn't be optimal, especially if you get more rows.  What you want to do is to merge on any rows (distinct to get just one per value) which = debit, then flag those:

data test;
  informat trandate ANYDTDTE9. debit 8.2 credit 8.2 ;
  input accountnum trandate debit credit count ;
  format trandate DATE9.;
datalines;
100001 30APR2018 2000.00 . 1
100002 23MAY2018 3000.00 . 1
100002 26MAY2018 2500.50 . 2
100002 27MAY2018 . 2100.00 3
100002 28MAY2018 2100.00 . 4
100002 29MAY2018 2500.10 . 5
100003 23MAY2018 2450.10 . 1
100003 24MAY2018 . 2450.10 2
;
run;

proc sql;
  create table want as
  select a.*,
         case when b.credit ne . then "Y" else "" end as flag
  from   test a
  left join (select distinct credit from test) b
  on     a.debit=b.credit
  order by accountnum,count;
quit;

View solution in original post

15 REPLIES 15
andreas_lds
Jade | Level 19

Maybe the only things required is a loop, but because you forgot to post input data, the expected result and an explanation of what "compare" means, it is nearly impossible to suggest something solving your issue.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep, otherwise we are just guessing.  If you have a list of variables with a numeric suffix then you will be talking about arrays or lists of values, you can define as:

data want;
  set have;
  array lag_debit(30);
  array lag_credit(30);
  do i=1 to 30;
    ...
  end;
run;

/* Or use a list directly as */
data want;
  set have;
  low=min(of lag_debit:);
run;

You can also remodel the data so you don't have a wide table, that would likely be far easier to manage and work with.

Selli5
Fluorite | Level 6

Hi,

 

I need a bit more help.  If I use the code suggested, it returns 100% match, however this is not true. What do I need to change?

thks

 

data want;
  set rule7;
  array lag_debit(30);
  array lag_credit(30);
  do i=1 to 30;
if lag_debit(30) = lag_credit(30) then match=1;
  end;
run;

andreas_lds
Jade | Level 19

Your code compares the 30th variables 30 times. Using "i" instead of "30" in the if-statement should help.

do i=1 to 30;
  if lag_debit(i) = lag_credit(i) then match=1;
end;
Selli5
Fluorite | Level 6

Thanks but it looks like its incorrectly matching missing lag_debits to missing lag_credits.  How can i specify that I only want matches when there is a value present?

 

 

Kurt_Bremser
Super User

Add a check to the condition for missing. Compare with . (representation of a missing numeric value), or use the missing() function.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Again, seeing test data in a datastep really helps us understand your question, and showing what you want out. 

data want;
  set rule7;
  array lag_debit{30};
  array lag_credit{30};
array results{30}; do i=1 to 30; results{i}=ifn(lag_debit{i}=lag_credit{i},1,0); end; run;

 You will note (use of code window, its the {i} above post area), instead of putting 30 in the if part, i use the inceremental variable i, so each element is checked rather than just element 30.  Also, if you want a result for each pairing you need a result for each pairing, hence another array, otherwise you will get 1 if any of lag_debit{i} matches the same element in credit.  

Just to add, I find using curly braces around array's makes it easier to spot them amongst function calls, which is why I change them in the code.

Selli5
Fluorite | Level 6

Thanks I understand that the input data would help, just haven't have time as yet.

 

It looks like the code is matching all the missing values and thinking that its a match, when really its just that there is no value (blank).

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

None of us have time, we are only volunteering on here - we don't work for SAS.  As such we have had this back and forth wasting time, where test data up front would have been quicker.

data want;
  set rule7;
  array lag_debit{30};
  array lag_credit{30};
array results{30}; do i=1 to 30; results{i}=ifn(lag_debit{i}=lag_credit{i} and lag_debit{i} ne .,1,0); end; run;
Kurt_Bremser
Super User

@Selli5 wrote:

Thanks I understand that the input data would help, just haven't have time as yet.

 


 


Then your issue can't be urgent at all. Otherwise, you would make sure to help us as much as you can in helping you.

Selli5
Fluorite | Level 6

Hello,

 

I have attached some input data and code below.

 

I would like to identify when there is an exact match between any lag_debits to any lag_credit (and exclude blanks).

 

So in this case account number 10002 should produce a match on the $2100 amount and account number 10003 should produce a match on $2450.10.

 

thanks

 

 

 

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

So this has nothing to do with arrays at all then.  Ok, we see here why some test data, and your missing what the output should look like, does to help illustrate - note how I post code in a code window in the post:

data test;
informat trandate ANYDTDTE9. debit 8.2 credit 8.2 ;
input accountnum trandate debit credit count ;
format trandate DATE9.;
datalines;
100001 30APR2018 2000.00 . 1
100002 23MAY2018 3000.00 . 1
100002 26MAY2018 2500.50 . 2
100002 27MAY2018 . 2100.00 3
100002 28MAY2018 2100.00 . 4
100002 29MAY2018 2500.10 . 5
100003 23MAY2018 2450.10 . 1
100003 24MAY2018 . 2450.10 2
;
run;

So basically to flag the row if previous is like it then:

data want;
  set test;
  debit_chk=ifn(lag(debit)=debit,1,0);
  credit_chk=ifn(lag(credit)=credit,1,0);
run;
Selli5
Fluorite | Level 6

Thanks, but that code would only allow comparing to the previous row of data.

 

I was using the arrays as I am trying to identify any matching debits (lag_debit1 to lag_debit5) to any credits (lag_credit1 to lag_credit5). 

 

In the final dataset (result) Account 10002 should produce a match for 2100 Amount (lag_debit4 = lag_credit3) and Account 10003 should produce a match for 2450.10 (lag_debit1 to lag_credit2).  However the results dataset is currently not identifying these matches and returns 0 for results 1 to results 5.

 

data test;
informat trandate ANYDTDTE9. debit 8.2 credit 8.2 ;
input accountnum trandate debit credit count ;
format trandate DATE9.;
datalines;
100001 30APR2018 2000.00 . 1
100002 23MAY2018 3000.00 . 1
100002 26MAY2018 2500.50 . 2
100002 27MAY2018 . 2100.00 3
100002 28MAY2018 2100.00 . 4
100002 29MAY2018 2500.10 . 5
100003 23MAY2018 2450.10 . 1
100003 24MAY2018 . 2450.10 2
;
run;

data want;
format lag_debit1-lag_debit5 8.2 lag_credit1-lag_credit5 8.2;
do until(last.accountnum);
set test;
by     accountnum trandate;
array lag_debit{5};
array lag_credit{5};
if first.accountnum then call missing(of lag_debit(*));
if first.accountnum then call missing(of lag_credit(*));
lag_debit(count)=debit;
lag_credit(count)=credit;
output;
end;
run;

data result;
  set want;
  array lag_debit{5};
  array lag_credit{5};
  array results{5};
  do i=1 to 5;

results{i}= ifn(lag_debit{i}=lag_credit{i} and lag_debit{i} ne ., 1, 0);
 end;
run;

proc print data=result;
var accountnum lag_debit1-lag_debit5 lag_credit1-lag_credit5  results1 - results5 i ;
run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Whilst you could do it with arrays, that wouldn't be optimal, especially if you get more rows.  What you want to do is to merge on any rows (distinct to get just one per value) which = debit, then flag those:

data test;
  informat trandate ANYDTDTE9. debit 8.2 credit 8.2 ;
  input accountnum trandate debit credit count ;
  format trandate DATE9.;
datalines;
100001 30APR2018 2000.00 . 1
100002 23MAY2018 3000.00 . 1
100002 26MAY2018 2500.50 . 2
100002 27MAY2018 . 2100.00 3
100002 28MAY2018 2100.00 . 4
100002 29MAY2018 2500.10 . 5
100003 23MAY2018 2450.10 . 1
100003 24MAY2018 . 2450.10 2
;
run;

proc sql;
  create table want as
  select a.*,
         case when b.credit ne . then "Y" else "" end as flag
  from   test a
  left join (select distinct credit from test) b
  on     a.debit=b.credit
  order by accountnum,count;
quit;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1211 views
  • 3 likes
  • 4 in conversation