BookmarkSubscribeRSS Feed
Jeanette
Calcite | Level 5

Hi,

 

I am SAS beginner. I need your help with simple question. I need to use my own metric - Log Loss Metric. If I want to use the Model Comparison node, I can't see this type of metric in the options, but I need it.

 

I would like to implement the following:

 

First:

  • Model Comparison node select the best model according Log Loss Metric

Second:

  • SAS Model e.g. Gradient boosting, will use Log Loss metric to learn (e.g. R enables to set custom metric and use it in xgboost model to train model)

So far I know how to compute Log Loss Metric in SAS Code node. Based on this article An Overview of Machine Learning with SAS® Enterprise Miner™ I implemented this code in Kaggle competition San Francisco Crime classification

 

DATA &EM_EXPORT_TRAIN;
set &EM_IMPORT_DATA end=eof;
array posteriorProbs(1:39) P_CategoryARSON P_CategoryASSAULT P_CategoryBAD_CHECKS P_CategoryBRIBERY P_CategoryBURGLARY P_CategoryDISORDERLY_CONDUCT P_CategoryDRIVING_UNDER_THE_INFL P_CategoryDRUG_NARCOTIC P_CategoryDRUNKENNESS P_CategoryEMBEZZLEMENT P_CategoryEXTORTION P_CategoryFAMILY_OFFENSES P_CategoryFORGERY_COUNTERFEITING P_CategoryFRAUD P_CategoryGAMBLING P_CategoryKIDNAPPING P_CategoryLARCENY_THEFT P_CategoryLIQUOR_LAWS P_CategoryLOITERING P_CategoryMISSING_PERSON P_CategoryNON_CRIMINAL P_CategoryOTHER_OFFENSES P_CategoryPORNOGRAPHY_OBSCENE_MA P_CategoryPROSTITUTION P_CategoryRECOVERED_VEHICLE P_CategoryROBBERY P_CategoryRUNAWAY P_CategorySECONDARY_CODES P_CategorySEX_OFFENSES_FORCIBLE P_CategorySEX_OFFENSES_NON_FORCI P_CategorySTOLEN_PROPERTY P_CategorySUICIDE P_CategorySUSPICIOUS_OCC P_CategoryTREA P_CategoryTRESPASS P_CategoryVANDALISM P_CategoryVEHICLE_THEFT P_CategoryWARRANTS P_CategoryWEAPON_LAWS;
if F_Category = "ARSON" then t=1; 
if F_Category = "ASSAULT" then t=2; 
if F_Category = "BAD CHECKS" then t=3; 
if F_Category = "BRIBERY" then t=4; 
if F_Category = "BURGLARY" then t=5;
if F_Category = "DISORDERLY CONDUCT" then t=6; 
if F_Category = "DRIVING UNDER THE INFLUENCE" then t=7; 
if F_Category = "DRUG/NARCOTIC" then t=8; 
if F_Category = "DRUNKENNESS" then t=9; 
if F_Category = "EMBEZZLEMENT" then t=10; 
if F_Category = "EXTORTION" then t=11; 
if F_Category = "FAMILY OFFENSES" then t=12; 
if F_Category = "FORGERY/COUNTERFEITING" then t=13;
if F_Category = "FRAUD" then t=14;
if F_Category = "GAMBLING" then t=15;
if F_Category = "KIDNAPPING" then t=16;
if F_Category = "LARCENY/THEFT" then t=17; 
if F_Category = "LIQUOR LAWS" then t=18;
if F_Category = "LOITERING" then t=19;
if F_Category = "MISSING PERSON" then t=20;
if F_Category = "NON-CRIMINAL" then t=21;
if F_Category = "OTHER OFFENSES" then t=22;
if F_Category = "PORNOGRAPHY/OBSCENE MAT" then t=23;
if F_Category = "PROSTITUTION" then t=24;
if F_Category = "RECOVERED VEHICLE" then t=25;
if F_Category = "ROBBERY" then t=26;
if F_Category = "RUNAWAY" then t=27; 
if F_Category = "SECONDARY CODES" then t=28; 
if F_Category = "SEX OFFENSES FORCIBLE" then t=29; 
if F_Category = "SEX OFFENSES NON FORCIBLE" then t=30; 
if F_Category = "STOLEN PROPERTY" then t=31; 
if F_Category = "SUICIDE" then t=32; 
if F_Category = "SUSPICIOUS OCC" then t=33; 
if F_Category = "TREA" then t=34; 
if F_Category = "TRESPASS" then t=35; 
if F_Category = "VANDALISM" then t=36; 
if F_Category = "VEHICLE THEFT" then t=37; 
if F_Category = "WARRANTS" then t=38; 
if F_Category = "WEAPON LAWS" then t=39;

retain logloss 0;
logloss+log(posteriorProbs[t]);
if eof then do;
	logloss=(-1*logloss)/_N_;
	PUT "result " logloss;
end;

RUN;

But I don't know how to transmit this result to the Model Comparison Node or how to put it into the Model Node. Maybe it is not possible. This metric is the basic/necessary in modeling and predictions.

 

Is there any other solution to the problem except my approach? I hope that there is more sofisticated approach to the problem than mine.

 

Thanks for help.

1 REPLY 1
JasonXin
SAS Employee
Hi, Correct me if I am wrong, but I don't recall any existing calculated performance measure is actually your logic under another name. With this said, you can attach a SAS Code node to the model node the generate (it appears you are acccumulating to EOF. and your IF-Then is just segment class assignment that turns character label to T to ride through the formula). Then connect the SAS Code Node output file that contains your final logloss score. Connect to Model Import node pretending you are just importing a 'model built elsewhere'. At the Model Import node, specify the target (the target should be the same target variable you like to assess this measure with other measures). Then specify the calculated log-score field as the score field. Then connect the Model Import Node to the Model Comparison Node. Well, EM/SAS developer is not responsible for the mathematical properties and interpretability of special performance measure. For example, if the scale should be between 0 and 1, but it is not, then you will need to vet and trim it, to make good use of Model Comparison Node. Hope this helps? Thank you using SAS. Best Regards Jason Xin

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2070 views
  • 0 likes
  • 2 in conversation