Hi all,
I'm working on a predictive modeling pipeline in SAS Viya using Open Source Code nodes (Python).
The pipeline works well during training, but when I add a Scoring Node, I get this :
I understand this may be due to the fact that new variables (like predictions) are being created dynamically in the Python code, but I haven't found a clear solution to properly register or declare them for scoring.
I've had several long chats with ChatGPT — no luck so far, I'm now in desperate need of help
I'm attaching the code below for context. Any help, advice, or working example would be sincerely appreciated!
Thanks in advance 🙏
import pandas as pd
import numpy as np
import lightgbm as lgb
target_col = 'Stress_level'
target_months = [202308, 202309, 202409, 202410]
record = dm_inputdf.copy()
dm_interval_input = ["DATA_YM", "blood_pressure", "sleep_duration", "work_hours", "age"]
rec_intv = record[dm_interval_input].astype(np.float32)
rec_all = pd.concat([rec_intv.reset_index(drop=True)], axis=1)
record["target_group"] = np.where(record["DATA_YM"].isin(target_months), "tr1", "tr2")
tr1 = rec_all[record["target_group"] == "tr1"]
tr2 = rec_all[record["target_group"] == "tr2"]
y_tr1 = record.loc[record["target_group"] == "tr1", target_col].astype(np.float32)
y_tr2 = record.loc[record["target_group"] == "tr2", target_col].astype(np.float32)
model_tr1 = lgb.LGBMRegressor(
n_estimators=3000,
learning_rate=0.05,
max_depth=12,
num_leaves=13,
force_row_wise=True
)
model_tr1.fit(tr1, y_tr1)
model_tr2 = lgb.LGBMRegressor(
n_estimators=3000,
learning_rate=0.05,
max_depth=12,
num_leaves=13,
force_row_wise=True
)
model_tr2.fit(tr2, y_tr2)
pred_tr1 = model_tr1.predict(tr1)
pred_tr2 = model_tr2.predict(tr2)
record.loc[record["target_group"] == "tr1", "P_Stress_level"] = np.clip(pred_tr1, 0, None)
record.loc[record["target_group"] == "tr2", "P_Stress_level"] = np.clip(pred_tr2, 0, None)
dm_scoreddf = record.copy()
dm_scoreddf["P_Stress_level"] = dm_scoreddf["P_Stress_level"].astype(np.float64)
dm_scoreddf=dm_scoreddf[[
"Stress_level", "blood_pressure", "sleep_duration", "work_hours", "age", "P_Stress_level", "DATA_YM"]]
dm_scoreddf["P_Stress_level"].attrs.update({
"role": "PREDICTION",
"level": "INTERVAL",
"description": "LightGBM Predition"
})import pandas as pd import numpy as np def score_method(blood_pressure, sleep_duration, work_hours, age, DATA_YM): "Output: P_Stress_level" record = pd.DataFrame([[blood_pressure, sleep_duration, work_hours, age, DATA_YM ]], columns=['blood_pressure', 'sleep_duration', 'work_hours', 'age', 'DATA_YM' ]) dm_interval_input = [col for col in record.columns if col not in dm_class_input] rec_intv = record[dm_interval_input] rec_intv_imp = imputer.transform(rec_intv) rec = np.concatenate((rec_intv_imp), axis=1) rec_pred = model_tr1.predict(rec) if int(DATA_YM) in target_months else model_tr2.predict(rec) return float(np.clip(rec_pred[0], 0, None))
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.