Good evening! I am working on the following project. I have a large database of bank customers, each of which is assigned with an aligned suspicion score (ALSS). Under the current rule, if a customer has an ALSS of 495 and above, and receives total cash of £3500, is considered suspicious to commit financial fraud.
What I want to do now is to retune the ALSS all the way up to the maximum value, 700. To do so, I need to bin the values as follows:
[min,495]
(495,600]
(600,700]
The total cash received must remain constant in 3500 pounds.
I don't know how to do this in SAS. I am carrying out research on it, nut I would appreciate if anyone could assist me. Thank you in advance!
First I think we may need to clarify your () and [] usage. Normally the [ or ] on an interval means that the end point is included in the interval and ( or ) means the end point is not included. So [min,495] would mean that 495 is included in the range. Which does not quite match your description of "ALSS of 495 and above" which would mean an interval like [495, xxx ] .
Second, what is the difference in output for the three ranges (assuming you mean something more like)
[min,495)
[495,600] (and clarify which interval 600 should be in)
(600,700]
I suspect that you mean "and receives total cash of at least £3500" . Otherwise only one value for cash seems like a poor rule.
This is where I think you may want to start. This assumes you have a data set named HAVE. Replace Have with the name of your data set. Replace WANT with the name you want to use for the output.
This assumes the value of cash is in a variable named CASH. If your variable has a different name then use that. Also assumes the score is in a variable named ALSS.
You do not mention any rule for what to do when cash does not exceed that minimum (assumed) value.
In the code below GE means "greater than or equal", LT is "less than" and LE is "less than or equal". I use these versions instead of symbols because I have used keyboards that have issues with some of the comparison characters. I have assigned three values to a variable named Bin in the order you presented ranges.
The IF for the Cash value shown with the indentation between DO and END means the bins are only assigned when the cash value is greater than or equal to 3500. I have to assume your currency is in the correct units and is numeric.
data want; set have; if cash ge 3500 then do; if 0 le ALSS lt 495 then bin=1; else if 495 le ALSS lt 600 then bin=2 ; else if 600 lt ALSS then bin=3; end; run;
First I think we may need to clarify your () and [] usage. Normally the [ or ] on an interval means that the end point is included in the interval and ( or ) means the end point is not included. So [min,495] would mean that 495 is included in the range. Which does not quite match your description of "ALSS of 495 and above" which would mean an interval like [495, xxx ] .
Second, what is the difference in output for the three ranges (assuming you mean something more like)
[min,495)
[495,600] (and clarify which interval 600 should be in)
(600,700]
I suspect that you mean "and receives total cash of at least £3500" . Otherwise only one value for cash seems like a poor rule.
This is where I think you may want to start. This assumes you have a data set named HAVE. Replace Have with the name of your data set. Replace WANT with the name you want to use for the output.
This assumes the value of cash is in a variable named CASH. If your variable has a different name then use that. Also assumes the score is in a variable named ALSS.
You do not mention any rule for what to do when cash does not exceed that minimum (assumed) value.
In the code below GE means "greater than or equal", LT is "less than" and LE is "less than or equal". I use these versions instead of symbols because I have used keyboards that have issues with some of the comparison characters. I have assigned three values to a variable named Bin in the order you presented ranges.
The IF for the Cash value shown with the indentation between DO and END means the bins are only assigned when the cash value is greater than or equal to 3500. I have to assume your currency is in the correct units and is numeric.
data want; set have; if cash ge 3500 then do; if 0 le ALSS lt 495 then bin=1; else if 495 le ALSS lt 600 then bin=2 ; else if 600 lt ALSS then bin=3; end; run;
Thank you very much for the detailed response. The solution you provide is very straightforward. Regarding the amount, it is the minimum amount of suspicion, so yes, £3500 and above. Regarding now the brackets, let me correct it:
[min,495)
[495,600)
[600,700]
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.