Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

How do I implement custom metric - Log Loss Metric in SAS Miner

Reply
New Contributor
Posts: 2

How do I implement custom metric - Log Loss Metric in SAS Miner

Hi,

 

I am SAS beginner. I need your help with simple question. I need to use my own metric - Log Loss Metric. If I want to use the Model Comparison node, I can't see this type of metric in the options, but I need it.

 

I would like to implement the following:

 

First:

  • Model Comparison node select the best model according Log Loss Metric

Second:

  • SAS Model e.g. Gradient boosting, will use Log Loss metric to learn (e.g. R enables to set custom metric and use it in xgboost model to train model)

So far I know how to compute Log Loss Metric in SAS Code node. Based on this article An Overview of Machine Learning with SAS® Enterprise Miner™ I implemented this code in Kaggle competition San Francisco Crime classification

 

DATA &EM_EXPORT_TRAIN;
set &EM_IMPORT_DATA end=eof;
array posteriorProbs(1:39) P_CategoryARSON P_CategoryASSAULT P_CategoryBAD_CHECKS P_CategoryBRIBERY P_CategoryBURGLARY P_CategoryDISORDERLY_CONDUCT P_CategoryDRIVING_UNDER_THE_INFL P_CategoryDRUG_NARCOTIC P_CategoryDRUNKENNESS P_CategoryEMBEZZLEMENT P_CategoryEXTORTION P_CategoryFAMILY_OFFENSES P_CategoryFORGERY_COUNTERFEITING P_CategoryFRAUD P_CategoryGAMBLING P_CategoryKIDNAPPING P_CategoryLARCENY_THEFT P_CategoryLIQUOR_LAWS P_CategoryLOITERING P_CategoryMISSING_PERSON P_CategoryNON_CRIMINAL P_CategoryOTHER_OFFENSES P_CategoryPORNOGRAPHY_OBSCENE_MA P_CategoryPROSTITUTION P_CategoryRECOVERED_VEHICLE P_CategoryROBBERY P_CategoryRUNAWAY P_CategorySECONDARY_CODES P_CategorySEX_OFFENSES_FORCIBLE P_CategorySEX_OFFENSES_NON_FORCI P_CategorySTOLEN_PROPERTY P_CategorySUICIDE P_CategorySUSPICIOUS_OCC P_CategoryTREA P_CategoryTRESPASS P_CategoryVANDALISM P_CategoryVEHICLE_THEFT P_CategoryWARRANTS P_CategoryWEAPON_LAWS;
if F_Category = "ARSON" then t=1; 
if F_Category = "ASSAULT" then t=2; 
if F_Category = "BAD CHECKS" then t=3; 
if F_Category = "BRIBERY" then t=4; 
if F_Category = "BURGLARY" then t=5;
if F_Category = "DISORDERLY CONDUCT" then t=6; 
if F_Category = "DRIVING UNDER THE INFLUENCE" then t=7; 
if F_Category = "DRUG/NARCOTIC" then t=8; 
if F_Category = "DRUNKENNESS" then t=9; 
if F_Category = "EMBEZZLEMENT" then t=10; 
if F_Category = "EXTORTION" then t=11; 
if F_Category = "FAMILY OFFENSES" then t=12; 
if F_Category = "FORGERY/COUNTERFEITING" then t=13;
if F_Category = "FRAUD" then t=14;
if F_Category = "GAMBLING" then t=15;
if F_Category = "KIDNAPPING" then t=16;
if F_Category = "LARCENY/THEFT" then t=17; 
if F_Category = "LIQUOR LAWS" then t=18;
if F_Category = "LOITERING" then t=19;
if F_Category = "MISSING PERSON" then t=20;
if F_Category = "NON-CRIMINAL" then t=21;
if F_Category = "OTHER OFFENSES" then t=22;
if F_Category = "PORNOGRAPHY/OBSCENE MAT" then t=23;
if F_Category = "PROSTITUTION" then t=24;
if F_Category = "RECOVERED VEHICLE" then t=25;
if F_Category = "ROBBERY" then t=26;
if F_Category = "RUNAWAY" then t=27; 
if F_Category = "SECONDARY CODES" then t=28; 
if F_Category = "SEX OFFENSES FORCIBLE" then t=29; 
if F_Category = "SEX OFFENSES NON FORCIBLE" then t=30; 
if F_Category = "STOLEN PROPERTY" then t=31; 
if F_Category = "SUICIDE" then t=32; 
if F_Category = "SUSPICIOUS OCC" then t=33; 
if F_Category = "TREA" then t=34; 
if F_Category = "TRESPASS" then t=35; 
if F_Category = "VANDALISM" then t=36; 
if F_Category = "VEHICLE THEFT" then t=37; 
if F_Category = "WARRANTS" then t=38; 
if F_Category = "WEAPON LAWS" then t=39;

retain logloss 0;
logloss+log(posteriorProbs[t]);
if eof then do;
	logloss=(-1*logloss)/_N_;
	PUT "result " logloss;
end;

RUN;

But I don't know how to transmit this result to the Model Comparison Node or how to put it into the Model Node. Maybe it is not possible. This metric is the basic/necessary in modeling and predictions.

 

Is there any other solution to the problem except my approach? I hope that there is more sofisticated approach to the problem than mine.

 

Thanks for help.

SAS Employee
Posts: 122

Re: How do I implement custom metric - Log Loss Metric in SAS Miner

Hi, Correct me if I am wrong, but I don't recall any existing calculated performance measure is actually your logic under another name. With this said, you can attach a SAS Code node to the model node the generate (it appears you are acccumulating to EOF. and your IF-Then is just segment class assignment that turns character label to T to ride through the formula). Then connect the SAS Code Node output file that contains your final logloss score. Connect to Model Import node pretending you are just importing a 'model built elsewhere'. At the Model Import node, specify the target (the target should be the same target variable you like to assess this measure with other measures). Then specify the calculated log-score field as the score field. Then connect the Model Import Node to the Model Comparison Node. Well, EM/SAS developer is not responsible for the mathematical properties and interpretability of special performance measure. For example, if the scale should be between 0 and 1, but it is not, then you will need to vet and trim it, to make good use of Model Comparison Node. Hope this helps? Thank you using SAS. Best Regards Jason Xin
Ask a Question
Discussion stats
  • 1 reply
  • 427 views
  • 0 likes
  • 2 in conversation