BookmarkSubscribeRSS Feed

Getting Started with Creating Charts in Python

Started ‎04-12-2016 by
Modified ‎04-16-2019 by
Views 1,109

Getting Started with Creating Charts in Python

 

There are many Python packages available for creating charts. Which one you use really depends on what the purpose of the final plot is. For quick results, Pandas and Seaborn are quite popular. For publication-ready plots, Matplotlib is a very common choice (the previous two packages are actually wrappers around Matplotlib). And for interactive plots, you may want to try Plot.ly or Bokeh.

 

The first thing we need to do is connect to CAS and upload some data. We are using the SAS CARS dataset in CSV form here.

 

In[1]: import swat

In[2]: conn = swat.CAS()

In[3]: tbl = conn.read_csv('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv')
In[4]: tbl.head()

 

Screen Shot 2016-08-12 at 1.51.43 PM.png

 

Let's subset the data to just the sports cars using the query method of the CASTable object.  This works just like the query method on DataFrames.  We'll then download the data into a local DataFrame using the head method.  We've specified a maximum numer of rows as 1,000 here.  That will cover all of the sports cars in the result.  Finally, we'll add an index to the DataFrame that contains the make and model of the car.

 

In[5]: sports = tbl.query('Type = "Sports"')
In[6]: sports
Out[6]: CASTable('_PY_c46200ed_1bad_4135_8b14_031a3890445c_', caslib='CASUSERHDFS(kesmit)', where='Type = "Sports"')

In[7]: df = sports.head(1000)
In[8]: df.set_index(df['Make'] + ' ' + df['Model'], inplace=True)
In[9]: df.head()

Screen Shot 2016-08-12 at 1.52.41 PM.png

 

Now that we have some data to work with, let's create some charts. To enable Matplotlib to embed images directly in the notebook, use the %matplotlib magic command. This works with Pandas plotting, Seaborn, and Matplotlib charts.

 

In[10]: %matplotlib inline

 

Pandas plot Method

 

Pandas DataFrames have a property called plot that makes it easy to create quick charts from the data in the DataFrame. In older versions of Pandas, plot was a method with a kind= attribute that indicated the type of plot to create. Newer versions of plot have methods for each individual plot type such as bar, scatter, line, etc.

 

In the example below, we are subsetting the DataFrame to only include MSRP and Invoice, then we are calling the plot.bar method to create bar charts of the columns in subplots. We will also use the rot= parameter to rotate the x axis labels.

 

 

In[11]: df[['MSRP', 'Invoice']].plot.bar(figsize=(15, 8), rot=-90, subplots=True)

Screen Shot 2016-04-12 at 7.24.01 PM.png

 

 

 

Creating Charts using Seaborn

 

The next step up from the plot method of DataFrames is using the Seaborn package. This package is a wrapper around Matplotlib that takes some of the work out of creating graphs and adds new ways of styling charts.

 

The code below creates a figure that contains two subplots as we did before. Seaborn is then used to create bar charts in each of the axes. Finally, the x axis labels are overridden so that they can be rotated -90 degrees as we did before.

 

In[12]: import seaborn as sns
In[13]: import matplotlib.pyplot as plt

In[14]: f, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 8), sharex=True)

In[15]: bar = sns.barplot(df.index, df['MSRP'], ax=ax1, color='blue')
In[16]: ax1.set_ylabel('MSRP')

In[17]: bar2 = sns.barplot(df.index, df['Invoice'], ax=ax2, color='green')
In[18]: ax2.set_ylabel('Invoice')

In[19]: labels = bar2.set_xticklabels(df.index, rotation=-90)

Screen Shot 2016-04-12 at 7.26.58 PM.png

 

Using Matplotlib Directly

 

The final entry in the static graphics line is Matplotlib itself. Panda's plot method and Seaborn are just wrappers around Matplotlib, but you can still use Matplotlib directly. For this case, it doesn't look a lot different than the Seaborn case. You'll noticed that we have to do a bit more adjustment of labels on the x axis and the x axis is a bit wider than it needs to be. Seaborn just helps out with those details automatically.

 

In[20]: import matplotlib.pyplot as plt

In[21]: f, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 8), sharex=True)

In[22]: ax1.bar(range(len(df.index)), df['MSRP'], color='blue')
In[23]: ax1.set_ylabel('MSRP')

In[24]: ax2.bar(range(len(df.index)), df['Invoice'], color='green')
In[25]: ax2.set_ylabel('Invoice')

In[26]: ax2.set_xticks([x + 0.25 for x in range(len(df.index))])
In[27]: labels = ax2.set_xticklabels(df.index, rotation=-90)

Screen Shot 2016-04-12 at 7.30.11 PM.png

 

 

Using Plot.ly and Cufflinks

 

The Plot.ly package can be used a couple of different ways. There's the Plot.ly API that uses standard Python structures as inputs, and there is an additional package called Cufflinks that integrates Plot.ly charts into Pandas DataFrames. Since we have our data in a DataFrame, it's easier to use Cufflinks to start.

 

The code below uses Cufflinks' iplot method on the DataFrame. The iplot method works much like the standard plot method on DataFrames except that it uses Plot.ly as the back-end rather than Matplotlib. After importing cufflinks, we use the go_offline function to indicate that we are using local graphics rather than the hosted Plot.ly service.

 

The benefit to Plot.ly graphics is that they are interactive when viewed in a web browser.

 

In[28]: import cufflinks as cf

In[29]: cf.go_offline()

In[30]: df[['MSRP', 'Invoice']].iplot(kind='bar', subplots=True, shape=(2, 1), shared_xaxes=True)

 

Screen Shot 2016-04-12 at 7.32.46 PM.png

 

To do a similar plot using the standard Plot.ly API takes a bit more work.

 

In[31]: import plotly.graph_objs as go
In[32]: from plotly import tools
In[33]: from plotly.offline import init_notebook_mode, iplot

In[34]: init_notebook_mode()

In[35]: data = [
    go.Bar(x=df.index, y=df.MSRP, name='MSRP'),
    go.Bar(x=df.index, y=df.Invoice, name='Invoice')
]

In[36]: fig = tools.make_subplots(rows=2, cols=1, shared_xaxes=True, print_grid=True)
In[37]: fig.append_trace(data[0], 1, 1)
In[38]: fig.append_trace(data[1], 2, 1)

In[39]: fig['layout']['height'] = 700
In[40]: fig['layout']['margin'] = dict(b=250)

In[41]: iplot(fig)

 

Screen Shot 2016-04-12 at 7.35.04 PM.png

 

Creating Charts with Bokeh

 

Bokeh is a popular graphics library for Python. The charting functionality is a more recent addition, so it isn't as mature as some of the other libraries here. However, it is an extremeley powerful and popular Python package. This chart could still use some work with label orientation and doing the two pieces as subplots rather than separate plots, but the functionality doesn't appear to exist in this release.

 

In[42]: from bokeh.charts import Bar, show
In[43]: from bokeh.io import output_notebook

In[44]: output_notebook()

In[45]: show(Bar(df, values='MSRP', ylabel='MSRP', width=1000, height=400, color='blue'))
In[46]: show(Bar(df, values='Invoice', ylabel='Invoice', width=1000, height=400, color='green'))

 

Screen Shot 2016-04-12 at 7.37.31 PM.png

 

Don't forget to close the connection when you're finished.

 

In[47]: conn.close()

 

Conclusion

 

We have shown the basics of several Python charting libraries here. Which of these (if any) that you use for your purposes really depends on your needs. The Matplotlib-based libraries are better at static and publication-style grahpics, whereas Plot.ly and Bokeh are more tuned to interactive charting in web browsers. Hopefully, we have given you enough information to pique your interest in one of these packages for creating charts from your CAS results.

 

Resources

 

You can download the Jupyter notebook version of this article at https://github.com/sassoftware/sas-viya-programming/tree/master/communities.

Version history
Last update:
‎04-16-2019 08:55 AM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags