Warning: this is really a python question, using SAS as background explanation. Asking here because it's helpful to explain it via SAS, it's a question about python for analytics, and I don't know any friendly python forms (suggestions welcome).
If I have a complex SAS program, I will often use macros to modularize the code, following recommendations from Ed Heaton's excellent paper, https://www.lexjansen.com/nesug/nesug01/at/at1010.pdf. So I might end up with a program that looks like:
%macro makereport(...);
%getdata(...)
%cleandata(...)
%fitmodel(...)
%plotit(...)
%mend makereport;
%makereport()
I've started playing with python, and curious if folks writing a complicated analytic program would use functions to modularize their code, or if they go further into OOP and write classes. I've scanned a couple python analytics books, but they seem to show how to call pandas (or whatever) in a script to get things done, and not so much on how to structure your code. I saw one blog post in favor of data scientists fully embracing OOP and building classes, but other posts that basically say 'just because python is object-oriented doesn't mean you have to create your own objects, if creating objects isn't useful don't do it.'
As an example, consider a simple python script to read a CSV with X and Y, fit a regression, and make a plot:
import pandas as pd import matplotlib.pyplot as plt import statsmodels.formula.api as smf #get data df=pd.read_csv("linear.csv") #fit model model = smf.ols('y ~ x', data=df) res=model.fit() #make plot y_hat=res.predict() plt.plot(df.x,df.y, 'o') plt.plot(df.x, y_hat, linewidth=2) plt.show()
You could use functions to modularize it like:
import pandas as pd import matplotlib.pyplot as plt import statsmodels.formula.api as smf def getdata(csv): df=pd.read_csv(csv) return df def fitmodel(df): model = smf.ols('y ~ x', data=df) res=model.fit() return res def plotit(df,res): y_hat=res.predict() plt.plot(df.x,df.y, 'o') plt.plot(df.x, y_hat, linewidth=2) plt.show() def runall(csv): df=getdata(csv) res=fitmodel(df) plotit(df,res) runall("linear.csv")
Or define a class, and use it like:
import pandas as pd import matplotlib.pyplot as plt import statsmodels.formula.api as smf class curve: def __init__(self, csv): self.df = self.getdata(csv) model = smf.ols('y ~ x', data=self.df) self.res=model.fit() def getdata(self,csv): df=pd.read_csv(csv) return df def fitmodel(self,df): model = smf.ols('y ~ x', data=df) res=model.fit() return res def plotit(self): y_hat=self.res.predict() plt.plot(self.df.x,self.df.y, 'o') plt.plot(self.df.x, y_hat, linewidth=2) plt.show() mycurve=curve("linear.csv") mycurve.plotit()
Clearly if you are building an application, there are benefits to creating classes. And I recognize that in analytic work, there is a wide gray zone between an ad hoc script for a one-off analysis, and an analytic application. (e.g. is a program that you manually run once a month to generate a monthly report an application?) In my real life SAS programming I don't always modularize my code.
So when you're writing python code for data analytics:
Related: when you modularize code with functions or classes, do you keep the code to define the functions/classes in your main .py script, or do you put each function/class definition into its own .py file and import them? I guess putting them into their own .py file and importing them would be analogous to storing SAS macro definitions as .sas files in an autocall library, which is my usual practice.
Would also welcome any suggestions for good books / sites etc about python coding patterns / best practices when using python for data management/analytics. Most of the python sites are about programming, rather than analytics.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.