BookmarkSubscribeRSS Feed

Using The SAS VS Code Extension to Check Prime Distribution

Started 2 weeks ago by
Modified a week ago by
Views 508

In my last post I calculated prime numbers using MPP CAS. There was a split focus on prime number math and SAS's capacity for parallel processing. For this post I'll be taking some of those generated numbers and looking into their distribution along the number line as an example to showcase some interesting features that the SAS Visual Studio Code extension has to offer. We'll be reusing the single-machine prime number generation code and performing some basic data exploration using SAS, SQL, and Python together in one SAS notebook file. The end result will be shipped off to SAS Studio as an executable flow.

 

SAS Visual Studio Code Extension

 

The VS Code extension allows you to browse all the same libraries and files that you would in SAS Studio. For this post, we'll be creating a SAS notebook (.sasnb file) through the Visual Studio interface that will be called "primeexploration.sasnb".

 

The benefit of these notebooks is that you can seamlessly switch programming languages for individual cells, while adding text-only markdown content to explain each step of your work.

 

Note: a "cell" is like a contained block of code. The following screenshot shows 4 cells, and the language each cell has been set to use is displayed in the bottom right corner:

 

01_RK_vscodeshowcase-1024x578.png

 

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

SAS automatically wraps any Python and SQL code you input in an invisible PROC Python or PROC SQL procedure respective to the language you chose. It's possible to view this wrapper code by modifying the output to display the SAS log renderer, rather than the default HTML renderer:

 

02_RK_sqlnormaloutput.png

 

Changing the presentation can be beneficial if you want to know what your code is doing behind the scenes:

 

03_RK_sqlsasoutput.png

 

There are more options and views to play around with. However, these basics will get us through the journey from VS Code --> exploration and discovery --> SAS Studio flow.

 

Building Our Program

 

In this section, we'll follow the logical steps I took and the ways in which my work was made easier by the flexibility of a SAS notebook.

 

The premise for the program is that I want to show how prime numbers become scarcer as you travel down the number line.

 

To begin, I recycled the SAS code for generating prime numbers from the previous post. The cell was of the language type "SAS" and the code was as follows:

 

DATA primenumbers (drop= L j k a);

DO UNTIL(primes=10000);
primes+1;
L=0;
DO j=1 TO primes;
k=0;

IF (MOD(primes , j)= 0) THEN DO;
k+1;
a=MOD(primes , j);
END;

IF K=1 THEN L+1;
END;

IF L=2 THEN OUTPUT;
END;
RUN;

 

Now that we have a dataset consisting of all the prime numbers between 1 and 10,000 (if you're curious, there are 1229) we can perform some data exploration to see what we're working with. As a quick test I wanted to show how the first 50 numbers on our number line contain more primes than the last 50. I decided to do this using SQL, as I've used it before to print portions of datasets.

 

The code:

 

SELECT * FROM work.primenumbers WHERE primes < 50;

SELECT * FROM work.primenumbers WHERE primes > (10000 – 50);

 

The output(s):

 

04Capture_RK_.PNG

 

We can see that there are 15 prime numbers in the first 50 values along the number line, and only 2 in the last 50. This is interesting, but it doesn't illustrate the point as well as examining the entire dataset would.

 

To do this examination, I decided to split the data into 4 quarters. Q1 is comprised of every prime less than 2500, Q2 has primes from 2500 to 5000, Q3 has primes from 5000 to 7500, and Q4 has the remaining primes between 7500 and 10000. To assign these quarter identifiers to each row I switched back to using SAS code, as that was what I found to be easiest for me:

 

DATA work.primedistribution;
SET work.primenumbers;IF primes < 2500 THEN quarter = ‘Q1’;
ELSE IF primes < 5000 AND primes >= 2500 THEN quarter = ‘Q2’;
ELSE IF primes < 7500 and primes >= 5000 THEN quarter = ‘Q3’;
ELSE quarter = ‘Q4’;RUN;

 

Finally, I had to create a chart to show the frequency of values in each quarter. Out of all 3 languages I consider myself most experienced with Python for creating graphs, so I used it!

 

That's the best thing to me about this extension: switching programming languages to whatever best works for the user makes experimentation much easier. I can leverage my past programming knowledge with SQL and Python in the SAS environment to take advantage of my immediate strengths as a new SAS user.

 

The Python code:

 

import matplotlib.pyplot as plt
df = SAS.sd2df(‘work.primedistribution’)
ax = df.quarter.value_counts().plot(kind=’barh’)
SAS.pyplot(plt)
ax.set_xscale(“log”)

 

Our Python code does need a bit of explaining. Python can't work directly with a SAS dataset, and we need to convert our data over to a dataframe that Python can manipulate. Invoking "SAS.sd2df" allows us to pass in our 'work.primedistribution' dataset that we got from the previous SAS code cell, and convert it accordingly.

 

SAS.pyplot tells the SAS HTML renderer how to construct the output graph. Alternatively, you could convert a SAS dataset into a Python-readable dataframe (using sd2df), manipulate it in python, convert it back into a SAS dataset (using df2sd), and display the results. However, since we aren't making sweeping changes to the data I found it easiest to use the SAS.pyplot approach.

 

The following is the output:

 

05_RK_primequarterdistribution.png

 

Given more time I would format this output to look more appealing, but it serves its purpose in the current form.

 

We see that Q1 contains the most primes, whereas each ascending quarter contains fewer than the one before it.

 

This result is interesting, but what if we then wanted to move this code over to SAS Studio for collaboration with our team? Maybe they could find some new insights (or make my graph prettier).

 

Converting Our Code to a SAS Studio Flow

 

SAS Studio's "flow" feature allows users to create modular programs comprised of different nodes. These nodes may be procedure steps, data steps, files, etc. That's structure is almost the same as the SAS notebook structure we've been working with. Every cell in the SAS notebook could easily become a node in a SAS Studio flow. Lucky for us, converting from a notebook to a flow takes only a few clicks:

 

06_RK_converttoflow.png

 

Now that are flow is created, we can examine it in SAS Studio:

 

07_RK_studioflowoverview.png

 

Each cell has become a program node, and we can run the flow using the "run" button to get the same results as our notebook.

 

Each node can be edited separately by my team, and they can fix my graph for me (thank you).

 

Related Links:

 

Base SAS + SAS/CONNECT – A simple method to generate load on any number of licensed cores

 

Using MPP CAS Multi-Threading to boost prime number computing speed using the brute force method

 

 

Find more articles from SAS Global Enablement and Learning here.

Comments

Thanks, this is fascinating.  I have played with VS Code extension on SAS 9 and Workbench beta at SAS Explore, but have never tried it on Viya/CAS as I don't have access.

 

I had seen the "SAS Notebook" file option but hadn't understood that it would let you seamlessly switch between languages in each cell.  That seems like an amazingly powerful feature.   I assume this works in Workbench the same way.  Workbench is planned to get R integration later this year.  So with that, you could switch back and forth between SAS, Python, and R (I hope!).

Version history
Last update:
a week ago
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags