Hi! I have managed to come up with a model pipeline in python for clustering of text using DBSCAN and I wish to import this model into SAS model manager through SAS CTL for further analysis of the clusters using SAS Topic Modelling methods (can't use OS for this step) Most of the examples I see online use SASCTL to import supervised learning models. However, my output in this case is a bunch of cluster labels, and the number of clusters widely depends on every run of the model. So far, i have converted my DBSCAN model into a pickle file, and i have created a score file for my DBSCAN model (which is basically using silhouette score and Davie Bouldin score to evaluate the efficiency of the clustering). I have also created Json files for my input variables and output variables. Right now, I'm getting stuck at the part where I have to write my model properties into a JSON file and the metadata information, as my target values aren't binary. Does anyone have any examples/implementation of people importing unsupervised clustering models into SAS Model Manager with SASCTL? It would be really helpful if someone could guide me on how to move on from this step. Thank you in advance!! EDIT: I managed to import my code (score code + json files + pickle file) into model manager, but I get this error message for my score code "The score code for the model could not be found. Details: The score code wrapper could not be generated for the model because the Python source code is not in the correct format." this is my score code: %%writefile ./Python_DBSCAN/DBSCAN_score.py import numpy import pandas as pd import pickle import settings import spacy from sklearn.decomposition import TruncatedSVD from sklearn.feature_extraction.text import TfidfVectorizer import umap.umap_ as umap from sklearn.neighbors import NearestNeighbors from matplotlib import pyplot as plt from kneed import KneeLocator from sklearn.metrics import davies_bouldin_score from sklearn.cluster import DBSCAN from sklearn.preprocessing import StandardScaler from sklearn import metrics def computeScore(<my input variables, but im only using one of them for the clustering>😞 try: _thisModelFit except NameError: with open(settings.pickle_path + "/Python_DBSCAN.pickle", 'rb') as _pFile: _thisModelFit = pickle.load(_pFile) input_list = [[<list of my input variables>]] input_df = pd.DataFrame(input_list, columns=[<my input variables>]) # make pred proba = metrics.davies_bouldin_score(_thisModelFit.X_scale, _thisModelFit.labels_) return proba
... View more