Hi all,
I'm working on a predictive modeling pipeline in SAS Viya using Open Source Code nodes (Python).
The pipeline works well during training, but when I add a Scoring Node, I get this :
I understand this may be due to the fact that new variables (like predictions) are being created dynamically in the Python code, but I haven't found a clear solution to properly register or declare them for scoring.
I've had several long chats with ChatGPT — no luck so far, I'm now in desperate need of help
I'm attaching the code below for context. Any help, advice, or working example would be sincerely appreciated!
Thanks in advance 🙏
import pandas as pd import numpy as np import lightgbm as lgb target_col = 'Stress_level' target_months = [202308, 202309, 202409, 202410] record = dm_inputdf.copy() dm_interval_input = ["DATA_YM", "blood_pressure", "sleep_duration", "work_hours", "age"] rec_intv = record[dm_interval_input].astype(np.float32) rec_all = pd.concat([rec_intv.reset_index(drop=True)], axis=1) record["target_group"] = np.where(record["DATA_YM"].isin(target_months), "tr1", "tr2") tr1 = rec_all[record["target_group"] == "tr1"] tr2 = rec_all[record["target_group"] == "tr2"] y_tr1 = record.loc[record["target_group"] == "tr1", target_col].astype(np.float32) y_tr2 = record.loc[record["target_group"] == "tr2", target_col].astype(np.float32) model_tr1 = lgb.LGBMRegressor( n_estimators=3000, learning_rate=0.05, max_depth=12, num_leaves=13, force_row_wise=True ) model_tr1.fit(tr1, y_tr1) model_tr2 = lgb.LGBMRegressor( n_estimators=3000, learning_rate=0.05, max_depth=12, num_leaves=13, force_row_wise=True ) model_tr2.fit(tr2, y_tr2) pred_tr1 = model_tr1.predict(tr1) pred_tr2 = model_tr2.predict(tr2) record.loc[record["target_group"] == "tr1", "P_Stress_level"] = np.clip(pred_tr1, 0, None) record.loc[record["target_group"] == "tr2", "P_Stress_level"] = np.clip(pred_tr2, 0, None) dm_scoreddf = record.copy() dm_scoreddf["P_Stress_level"] = dm_scoreddf["P_Stress_level"].astype(np.float64) dm_scoreddf=dm_scoreddf[[ "Stress_level", "blood_pressure", "sleep_duration", "work_hours", "age", "P_Stress_level", "DATA_YM"]] dm_scoreddf["P_Stress_level"].attrs.update({ "role": "PREDICTION", "level": "INTERVAL", "description": "LightGBM Predition" })
import pandas as pd import numpy as np def score_method(blood_pressure, sleep_duration, work_hours, age, DATA_YM): "Output: P_Stress_level" record = pd.DataFrame([[blood_pressure, sleep_duration, work_hours, age, DATA_YM ]], columns=['blood_pressure', 'sleep_duration', 'work_hours', 'age', 'DATA_YM' ]) dm_interval_input = [col for col in record.columns if col not in dm_class_input] rec_intv = record[dm_interval_input] rec_intv_imp = imputer.transform(rec_intv) rec = np.concatenate((rec_intv_imp), axis=1) rec_pred = model_tr1.predict(rec) if int(DATA_YM) in target_months else model_tr2.predict(rec) return float(np.clip(rec_pred[0], 0, None))
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.