Run LLM on Viya 4 locally

GreenCode · Posted 04-16-2025 10:24 AM

Because there no SAS/Toolkit for Viya 4 right now, so the PROC OLLAMA cannot be migrated to Viya 4. here is the python way to run LLM on Viya 4 locally by using python module llama-cpp-python. by default llama-cpp-python may be not installed on your Viya 4 system. because the PROC PYTHON are executed by the compute server pod which is a read-only file system, so you cannot install any python package on this pos directly. to bypass this restriction, need make some changes on compute service's configuration, which control the compute server's system environment created by it. suppose the you have nfs path /data/sascode mount to /sascode, can be viewed in SASStudio. create a sub folder pythonhome under /data/sascode. /data/sascode/pythonhome will become new python home folder. or use another writable persistent path instead.

1. go to SAS env. manager, edit the compute service sas.compute.server: startup_commands, add command export PYTHONHOME=/sascode/pythonhome

2. go to SAS env. manager, edit SASStudio compute context advanced attribute, add new attribute to allow run x, pipe SAS statement in SASStudio, name: allowXCMD, value: true

3. check compute server pod's OS by running SAS code:

filename oscmd pipe "cat /etc/os-release";
data _null_;
infile oscmd;
input;
put _infile_;
run;

4. install a host that OS is same as the compute server pod's OS, and install gcc related development packages, and sqlite-devel, libffi-devel.

5. check the current python version by run python code in SASStudio

6. download the version of python source code to the host build in step 4. create directory /data/sascode/pythonhome. go to python source folder, configure/make/install it with:

./configure --prefix=/data/sascode/pythonhome

make

make install

7. pip install llama-cpp-python huggingface_hub and other python module you want.

8. copy folder /data/sascode/pythonhome from the host to nfs host,

/data/sascode/pythonhome , then can be viewed in SASStudio

9. download model from https://huggingface.co/models to /sascode by using python code or clicking on the model file directly:

from llama_cpp import Llama

llm = Llama.from_pretrained(
repo_id="ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF",
filename="*.gguf",
local_dir="/sascode",
cache_dir="/sascode",
verbose=False
)

If you get python error that indicates permission error about /sascode/.cache_dir when run the python code above, run SAS code below then retry the donwloading code :

data _null_;

x "chmod -R /sascode";

run;

10. run the local downloaded LLM and save the result to WORK.RESULT:

import pandas as pd
from llama_cpp import Llama
llm = Llama(
model_path="/sascode/gemma-1.1-7b-it.Q4_K_M.gguf",
)
input_text = "Once upon a time"
output = llm(input_text, max_tokens=50)
print(output['choices'][0]['text'])
#save result to WORK.RESULT
df = pd.DataFrame({"rsult":[output['choices'][0]['text']]});
ds = SAS.df2sd(df, 'RESULT')

NOTES:

1. There might be a memory error.
Increase the Compute podtemplate memory using the commands below:

kubectl -n viya annotate podtemplate sas-compute-job-config launcher.sas.com/default-memory-limit=64Gi --overwrite
kubectl -n viya annotate podtemplate sas-compute-job-config launcher.sas.com/max-memory-limit=64Gi --overwrite

Also increase memsize in Environment Manager
Contexts->
Compute context->
SAS Studio Compute context
Enter following line to SAS option:
-memsize max

2. If prompt python module not found, repeat above 7, 8 step

Run LLM on Viya 4 locally

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away