Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

A statistic question about model

Reply
New Contributor
Posts: 4

A statistic question about model

Provider Charge(Total) Fraud  %Fraud    Fraud*%Fraud
A           1000               600        0.6           360
B            100               70           0.7            49
C            10                 8             0.8            6.4
D            1                   1             1.0            1

Which provider should be given alert?
I am creating a model and setting up a threshold as above. Which provider should be given alert? If %Fraud is used, then provider D should be given alert. Obviously, it is unreasonable. Then, I want to use Fraud*%Fraud, the results are pretty reasonable.
Question: 1)Are there any statistic theory about Fraud*%Fraud? If so, what is the theory?
2)If Fraud*%Fraud is unreasonable. Are there any better suggestion?
Thanks in advance.

Super User
Posts: 17,828

Re: A statistic question about model

Fraud*%Fraud = Fraud*Fraud/Amount

 

I might be wrong but its the same measurement of Fraud at the end of the day? It will flag the same records as fraud amount will. 

 

Having a hard picture that curve and I tried to graph it but it seems to be not that significant:

 

data have;
do amount=1 to 1000;
do fraud=0 to 800 by 10;
fraud2=fraud*fraud/amount;
output;
end;
end;
run;

proc sgplot data=have;
heatmapparm x=amount y=fraud colorresponse=fraud2;
run;quit;

I'm not sure what a better measurement might be though, it's worth checking the literature I suppose. This is a well known problem - using amount or percentage as detector.  

 

 

 

New Contributor
Posts: 4

Re: A statistic question about model

Hi  

Thank you for your response.

Yes, you are totally right. Most of the models use the amount or percentage as detector.

But I want to try used Fraud*%Fraud as  detector.

So do you think Fraud*%Fraud is reasonable? 

and do you have a literature to support this?

Thanks.

Super User
Posts: 17,828

Re: A statistic question about model

No I don't think this is reasonable and no I don't have literature to back this up.

 

My rationale is why scale the number to something you don't already have, this adds complexity to a model. 

 

Without knowing your full model it's hard to make a guess. Have you looked into 2 stage models?

SAS Employee
Posts: 106

Re: A statistic question about model

Can you say more about the model you are creating? Depending on what you are trying to accomplish, you may not need to compute percentages. For example if you are attempting to identify characteristics of fraudulent cases, then you can use fraud/no fraud as a binary target. 

New Contributor
Posts: 4

Re: A statistic question about model

Hi rayIII 

Thank you for your response.

For example. KX is the maximum payment has been reached.

Provider Total_bene     KX_Bene  %KX_Bene    KX_Bene*%KX_Bene
A            1000               600            0.6                360
B            100                 70              0.7                49
C            10                   8                0.8                6.4
D            1                     1                1.0                 1

Most of the model use %KX_Bene as detector. But I want to use "KX_Bene*%KX_Bene" as a detector.

Do you think it is reseanonable? if so, is there any statistical theories to support that?

Thanks.

Super User
Posts: 10,500

Re: A statistic question about model

Please explain exactly what you think KX_Bene*%KX_Bene tells you that %KX_Bene does not.

New Contributor
Posts: 4

Re: A statistic question about model

hi  

This is a good question.

If %KX_Bene was used. Some small provider. Like Provider D. He has only 1 beneficiary, who is fraud bene. So, %KX_Bene is 100%, the highest. But, obviously, he is not the provider who we want to catch.

If KX_Bene*%KX_Bene  was used, then the number of fraud_bene was taken into account.  KX_Bene*%KX_Bene  is like a "Weight" or "Expectation".

Acctually, if KX_Bene*%KX_Bene  was used, the performance of the model become better. But, I cannot find the theory to support that.

 

 

 

Ask a Question
Discussion stats
  • 7 replies
  • 315 views
  • 3 likes
  • 4 in conversation