BookmarkSubscribeRSS Feed
thistleandtweed
Fluorite | Level 6

Hi! 

 

I have managed to come up with a model pipeline in python for clustering of text using DBSCAN and I wish to import this model into SAS model manager through SAS CTL for further analysis of the clusters using SAS Topic Modelling methods (can't use OS for this step)

 

Most of the examples I see online use SASCTL to import supervised learning models. However, my output in this case is a bunch of cluster labels, and the number of clusters widely depends on every run of the model. 

 

So far, i have converted my DBSCAN model into a pickle file, and i have created a score file for my DBSCAN model (which is basically using silhouette score and Davie Bouldin score to evaluate the efficiency of the clustering). I have also created Json files for my input variables and output variables. Right now, I'm getting stuck at the part where I have to write my model properties into a JSON file and the metadata information, as my target values aren't binary.

 

Does anyone have any examples/implementation of people importing unsupervised clustering models into SAS Model Manager with SASCTL? It would be really helpful if someone could guide me on how to move on from this step.

 

Thank you in advance!!

 

EDIT: I managed to import my code (score code + json files + pickle file) into model manager, but I get this error message for my score code "The score code for the model could not be found. Details: The score code wrapper could not be generated for the model because the Python source code is not in the correct format."

this is my score code:

%%writefile ./Python_DBSCAN/DBSCAN_score.py
import numpy
import pandas as pd
import pickle
import settings
import spacy
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer
import umap.umap_ as umap
from sklearn.neighbors import NearestNeighbors
from matplotlib import pyplot as plt
from kneed import KneeLocator
from sklearn.metrics import davies_bouldin_score
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn import metrics

def computeScore(<my input variables, but im only using one of them for the clustering>😞
    try:
        _thisModelFit
    except NameError:
        with open(settings.pickle_path + "/Python_DBSCAN.pickle", 'rb') as _pFile:
            _thisModelFit = pickle.load(_pFile)

    input_list = [[<list of my input variables>]]
    input_df = pd.DataFrame(input_list, columns=[<my input variables>])

    # make pred
    proba = metrics.davies_bouldin_score(_thisModelFit.X_scale, _thisModelFit.labels_)
    return proba

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 0 replies
  • 116 views
  • 0 likes
  • 1 in conversation