Importing Python Clustering Models into SAS through SASCTL and Model M...

thistleandtweed

Hi!

I have managed to come up with a model pipeline in python for clustering of text using DBSCAN and I wish to import this model into SAS model manager through SAS CTL for further analysis of the clusters using SAS Topic Modelling methods (can't use OS for this step)

Most of the examples I see online use SASCTL to import supervised learning models. However, my output in this case is a bunch of cluster labels, and the number of clusters widely depends on every run of the model.

So far, i have converted my DBSCAN model into a pickle file, and i have created a score file for my DBSCAN model (which is basically using silhouette score and Davie Bouldin score to evaluate the efficiency of the clustering). I have also created Json files for my input variables and output variables. Right now, I'm getting stuck at the part where I have to write my model properties into a JSON file and the metadata information, as my target values aren't binary.

Does anyone have any examples/implementation of people importing unsupervised clustering models into SAS Model Manager with SASCTL? It would be really helpful if someone could guide me on how to move on from this step.

Thank you in advance!!

EDIT: I managed to import my code (score code + json files + pickle file) into model manager, but I get this error message for my score code "The score code for the model could not be found. Details: The score code wrapper could not be generated for the model because the Python source code is not in the correct format."

this is my score code:

%%writefile ./Python_DBSCAN/DBSCAN_score.py

import numpy

import pandas as pd

import pickle

import settings

import spacy

from sklearn.decomposition import TruncatedSVD

from sklearn.feature_extraction.text import TfidfVectorizer

import umap.umap_ as umap

from sklearn.neighbors import NearestNeighbors

from matplotlib import pyplot as plt

from kneed import KneeLocator

from sklearn.metrics import davies_bouldin_score

from sklearn.cluster import DBSCAN

from sklearn.preprocessing import StandardScaler

from sklearn import metrics

def computeScore(<my input variables, but im only using one of them for the clustering>😞

try:

_thisModelFit

except NameError:

with open(settings.pickle_path + "/Python_DBSCAN.pickle", 'rb') as _pFile:

_thisModelFit = pickle.load(_pFile)

input_list = [[<list of my input variables>]]

input_df = pd.DataFrame(input_list, columns=[<my input variables>])

# make pred

proba = metrics.davies_bouldin_score(_thisModelFit.X_scale, _thisModelFit.labels_)

return proba

Importing Python Clustering Models into SAS through SASCTL and Model Manager

Ready to join fellow brilliant minds for the SAS Hackathon?