- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have below data. I need to flag observations based on combination. Below is the sample data
Data have;
Input account_id 8. Credit_amount 8. debit_amount 8.;
Cards;
1 10 100
2 20 200
1 100 10
3 20 200
4 10 100
2 200 20
5 10 50
;
Run;
So if you look at observation 1 and 3
Debit and credit are interchanged and hence this transaction needs to be flagged.
Same is the case with observations 2 and 7 hence this also needs to be flagged.
So I want output as below
ACCOUNT_ID CREDIT_AMOUNT DEBIT_AMOUNT FLAG
1 10 100 Y
2 20 200 Y
1 100 10 Y
3 20 200 N
4 10 100 N
2 200 20 Y
5 10 50 N
I tried using concatenation, however look back is something which is I am not able to figure out.
Any help is really appreciated.
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Data have;
Input account_id Credit_amount debit_amount;
Cards;
1 10 100
2 20 200
1 100 10
3 20 200
4 10 100
2 200 20
5 10 50
;
Run;
data temp;
set have;
n=_N_;
run;
proc sort data=temp;
by account_id;
run;
data temp;
set temp;
flag=0;
if Credit_amount=lag1(debit_amount) and debit_amount=lag1(Credit_amount) and account_id=lag1(account_id) then flag=1;
run;
proc sort data=temp out=want(drop=n);
by n;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@yashpande wrote:
Hi all,
I have below data. I need to flag observations based on combination. Below is the sample data
Data have;
Input account_id 8. Credit_amount 8. debit_amount 8.;
Cards;
1 10 100
2 20 200
1 100 10
3 20 200
4 10 100
2 200 20
5 10 50
;
Run;
So if you look at observation 1 and 3
Debit and credit are interchanged and hence this transaction needs to be flagged.
What is the actual rule involved? How does the data tell us that the values were interchanged? Can your data never have the credit amount greater than debit amount?
Suppose you have a slightly different set where you have the first record repeat. What do you want?
Data have;
Input account_id 8. Credit_amount 8. debit_amount 8.;
Cards;
1 10 100
2 20 200
1 100 10
3 20 200
4 10 100
2 200 20
5 10 50
1 10 100
;
Run;
I have a suspicion there might be date or time component to this problem that has not been mentioned.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Data have;
Input account_id Credit_amount debit_amount ;
Cards;
1 10 100
2 20 200
1 100 10
3 20 200
4 10 100
2 200 20
5 10 50
;
Run;
data _null_;
if 0 then set have;
length flag $8;
if _n_=1 then do;
dcl hash H (multidata:'y') ;
h.definekey ('account_id','Credit_amount','debit_amount') ;
h.definedata ('account_id','Credit_amount','debit_amount','flag') ;
h.definedone () ;
end;
set have end=last;
flag='N';
if h.check(key:account_id,key:debit_amount,key:Credit_amount)=0 then do;
flag='Y';
h.replace(key:account_id, key:debit_amount, key:Credit_amount,data:account_id, data:debit_amount, data:Credit_amount ,data:flag);
end;
h.add();
if last then h.output(dataset:'want');
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
An easy way:
data test;
set have;
smaller = min(credit_amount, debit_amount);
larger = max(credit_amount, debit_amount);
run;
proc sort data=test;
by account_id smaller larger;
run;
data want;
set test;
by account_id smaller larger;
if larger=smaller then flag='N';
else if first.larger=0 or last.larger=0 then flag='Y';
else flag='N';
drop smaller larger;
run;