
Increasingly, software developers and data scientists rely on LLMs to help with coding, but LLMs are poor at SAS coding. Hallucinations are common, and the LLM-generated SAS code often does not run without major changes. This combination of circumstances may lead to poor outcomes, such as SAS coders who may turn to alternatives like Python which have top-tier support in LLMs, whether used in the traditional dialog format or embedded in coding assistants such as GitHub Coplot and Windsurf.
The academic paper "The Llama 3 Herd of Models" in section 4.3.1 lists their top 10 top-tier languages (notably, not including SAS), and the paper details how Meta improved the ability of LLMs to generate better code. One potential solution to LLMs' struggle with SAS coding is for SAS to emulate Meta's approach by developing a corpus of SAS-specific training data that all LLMs can freely use. Then, SAS could publish this data set on Hugging Face and promote it to Meta, OpenAI, Google, and Claude.
The Llama paper gives a template for this process. In the case of SAS, coding questions and solutions could be automatically collected from resources such as the SAS documentation (example code), this SAS forum, StackOverflow, SAS support cases, and SAS blogs (to the extent permissible by copyrights, licenses, and ToS). Some strategies in the Llama paper: remove PII, automatically evaluate by LLMs, automatically write unit tests, automatically testing solutions in sandbox environments.
An intriguing strategy would be to develop a list of Python data science and business intelligence question, and then translate the solutions to SAS. This assumes that coders in each language are facing similar questions, but the Python questions are more abundant on the open Internet.
- Find more ideas tagged with:
- Artificial Intelligence
- code chat
- coding assistant
- ide
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.