BookmarkSubscribeRSS Feed

Pareto Example of Integration of SAS Visual Analytics with SAS Jobs via Data-Driven Content-Part 4

Started ‎09-02-2020 by
Modified ‎06-28-2023 by
Views 3,719

In this article, you are going to apply what you have learned about the integration between SAS Visual Analytics (VA), Data-Driven Content (DDC) objects, and SAS jobs to solve a business use case.

 

It’s very straight forward to create a compounded chart like the one in the screenshot below, showing bars for each country GDP and a line for the cumulative GDP as percentage, but the challenging part is to be able to filter that chart to only display the countries that are responsible for a certain percentage of the total, let’s say 80%, to create a Pareto chart.

 

Picture 1- Pareto chart for GDPPicture 1- Pareto chart for GDP

 

GTP (cumulative % of total) is an aggregated measure that must be computed from the highest to the lowest GDP, which is easily done by sorting the chart by GDP. An attempt to show only countries where the GTP (cumulative % of total) is less or equal to a certain value (e.g. list of countries that are responsible for 80% of the total GDP) will produce the wrong results if you try to use ranks, or it will not be possible if you try to use filters. Filtering on aggregated measures is not supported in this case and ranking would return a percentage of the total number of countries based on their GTP, not the countries responsible for a percentage of the total GDP. Using "New data from aggregation" to create an aggregated data source will not work either because of the sequential nature of the cumulative calculation, and while that sequence can be easily defined in a chart, it doesn’t exist for aggregated tables computed in VA. The solution is to pass the data to a job that will create a new table on the fly with the percentages accumulated in the right order. That table can then be filtered because the cumulative percentage is no longer an aggregated measure calculated in VA.

 

 

Pareto with Ranking Example

 

This example uses data downloaded from The World Bank. The table contains GDP information for countries from years 1960 to 2019, and it was exported as an Excel spreadsheet.

 

Picture 2- GDP source data from The World BankPicture 2- GDP source data from The World Bank

 

 

In SAS Data Explorer (menu option Manage Data), make sure you select the correct cells from the spreadsheet while importing it, as in the next screenshot below, as well as the other values highlighted in yellow. I suggest opening the Excel file to confirm that those values remain the same.

 

Picture 3- Import options for Excel dataPicture 3- Import options for Excel data

 

By the time the data used in the report was downloaded, year 2019 didn’t have any value, so this column was not used. Also, that original table was transposed, so that the final table had years as rows instead of columns:

 

Picture 4- Transposed input tablePicture 4- Transposed input table

 

The table can be transposed in SAS Data Studio (menu option Prepare Data). Just use the Transpose transformation with the following column assignment:

  • ID Columns: Indicator Name
  • Transpose Columns: all year columns from 1960 to 2018 (do not include 2019 as it is empty)
  • Group By Columns: Country Name, Country Code

Don’t forget to rename the _NAME_ column to Year:

 

Picture 5- SAS Data Studio transpose transformationPicture 5- SAS Data Studio transpose transformation

 

The final transposed table was saved in the Public library and called GDP_COUNTRY_YEAR_TRANSPOSED. This is the source data for your report.

 

 

Data-Driven Content Code

 

As explained in the previous article, enhancements were made to the HelloCASWorld example to make the DDC code reusable with any job that comply with the interface defined:

  1. Job receives a parameter castab from the DDC
  2. Job returns a JSON message back to the DDC with the following format: {success: …, retcode: …, message: …}

This means the DDC created in the previous article can be used without any modification in this example. 

 

Because this DDC is to be used with many different jobs, in the deployment section below you will see that you are going to be saving this code as a separated job form with a generic name: ProxyDDCForVAJobCASIntegration, indicating that this DDC is to be used as a proxy to connect VA with jobs, and it’s not attached to a particular job.

 

 

SAS Job Code

 

Starting with the template job code from the previous article example, you will only need to add the highlighted code in the “Main Processing” block:

 

Picture 6- Changes needed in the “Main Processing” blockPicture 6- Changes needed in the “Main Processing” block

 

The PROC CONTENTS and PROC SQL are used to acquire the categorical and numeric column names necessary for PROC FREQ, which is what calculates the cumulative values, percentages, and cumulative percentages. The data step after the PROC FREQ is to adjust the percentages as numbers between 0-1, so that a VA percentage format can be applied.

 

Notice that the job is not applying business names to the columns generated by PROC FREQ and saved in memory in the output table. It’s your responsibility, as the report designer, to do so.

 

 

SAS Visual Analytics Report

 

The report in this example contains a drop-down object to select the year. This drop-down object filters the DDC object, which is the blank space on the top right. Both objects use data coming from the main source table called GDP_COUTRY_YEAR_TRANSPOSED. The role assignment in the DDC object are Country Name and GDP (current US$). The bar-line chart and the list table below it are fed by the table generated by the job, named PARETO_RANK. The slider object has a single value numeric VA parameter formatted as percentage, with min and max values set to 0 and 1 respectively, which selects the percentage to be used as the filter. The parameter is used in an advanced filter to subset the PARETO_RANK data in both the bar-line chart and the list table. Notice that only when the year drop-down object is changed, the DDC and the job are re-executed. The slider object filters the job’s output table PARETO_RANK without having to re-create the table.

 

Picture 7- VA sample reportPicture 7- VA sample report

 

When designing a report like that from the scratch you must remember of a few things discussed in the previous article, such as:

  1. The job’s output table in the CASUSER library doesn’t exist until you execute the job successfully at least once. Only after the table is available in the CASUSER library, you can add it into the report and use it as the source of other VA report objects, like the bar-line chart and the list table.
  2. Tables from CASUSER library that are added to VA reports keep reference to the report author. This reference must be removed before the report is shared with a broader audience. As the report designer, you do that by editing the report BIRD XML:

a. Save the VA report

b. Hit Ctrl+Alt+B to open the SAS Visual Analytics Diagnostics window

c. Make sure the BIRD tab across the top and the XML button are selected (these are the defaults)

d. Click on the Data icon on the left

e. Scroll down searching for the reference to the CASUSER library

f. Remove (<userid>) that is appended to the CASUSER library

 

Picture 8- Removing userid from CASUSERPicture 8- Removing userid from CASUSER

  

g. Click the Load button on the top left

h. Save the VA report

If you have the right privileges for exporting and importing content in SAS Environment Manager (menu option Manage Environment), you can replace all the steps above by exporting the report and them reimporting it. You just need to replace the target CASUSER(<userid>) library with just CASUSER in the GUI during the import process.

  1. You must set automatic refresh in the objects that depend on the table loaded in the CASUSER library, otherwise they will not refresh as the data changes with the job execution. You do that in the object’s Option tab by turning on the option labeled either “Automatically refresh object” or “Periodically reload object data” depending on the VA release. In this example, the objects that need automatic refresh are the bar-line chart and the list table at the bottom of the report. Remember that automatic refresh only works when the report is in view mode.
  2. You must define at least the parameters _job_name and _job_output_cas_table in the VA report and assign a value to them. The parameter _job_executing_message is optional. All of them are character parameters and their names are case sensitive. For this example, the following values were assigned:

_job_output_cas_table: "PARETO_RANK"

_job_executing_message: "Calculating Pareto..."

_job_name: "/Public/Jobs/SAS Communities/Pareto”

  1. VA parameters are only passed in the JSON message to the DDC if the parameters affect the data that is being passed to the DDC. This is further explained in the article Using parameters with Data-Driven Content in SAS Visual Analytics, and in this example you guarantee that the parameters are being passed to the DDC by setting a dummy advanced filter in the DDC object that looks like this:

 

 Picture 9- Dummy advanced filter applied to the DDC object to guarantee the parameters are being passed in the VA messagePicture 9- Dummy advanced filter applied to the DDC object to guarantee the parameters are being passed in the VA message

 

 

In addition to those, after the job’s output table is added to the report for the very first time, you must give the data items a more business-oriented labels. As mentioned previously, the job used in the example doesn’t apply user-friendly labels to the columns calculated in PROC FREQ. Knowing what each column represents and labeling them appropriately is important. In this example, the output columns from PROC FREQ were labeled in the VA report as follows:

  • COUNT:              “GDP (current US$)”
  • PERCENT:          “GDP (% of total)”
  • CUM_FREQ:       “GDP (cumulative total)”
  • CUM_PCT:          “GDP (cumulative % total)”

 

 

Deploying This Example

 

All files used in this example are available for downloading from the GitHub project sas-visualanalytics-thirdpartyvisualizations, under folder called samples/IntegrationWithSASJobs.

 

It requires the data downloaded from The World Bank, transformed and loaded into memory in the Public caslib as explained in the beginning of this article.

 

Deployment steps:

  1. Download the GitHub project
  2. Unzip it into your Web Server document folder. These instructions assume you have unzipped the GitHub project in a folder called github in the Web Server document root folder. You will only need the content of folders github/utils and github/samples/IntegrationWithSASJobs.
  3. Logged into SAS Job Execution Web application, do the following:
    1. Create a new job in a folder of your preference (e.g. /Public/Jobs/SAS Communities) and name it Pareto (see first article for more details if needed)
    2. Open the job for edition, copy & paste the content of file github/samples/IntegrationWithSASJobs/4.ParetoUseCase/Pareto.sas
    3. Save the job
    4. Add job parameter _action=FORM as discussed in the first article
    5. Add another job parameter _form=/Public/Jobs/SAS Communities/ProxyDDCForVAJobCASIntegration

      This tells the job to transfer execution to this form, whenever the _action parameter is set to FORM

       

      Picture 10- SAS job fixed parametersPicture 10- SAS job fixed parameters

       

    6. Create a new job form as a separate content, as explained in the first article, and name it ProxyDDCForVAJobCASIntegration
    7. Open the job form for edition, copy & paste the content of file github/samples/IntegrationWithSASJobs/4.ParetoUseCase/ ProxyDDCForVAJobCASIntegration.html (1)
    8. Make changes to the host name (search for your.host.name and replace it accordingly) and path of src on lines 20, 21, and 22 (1):

       

      Picture 11- Modify src appropriatelyPicture 11- Modify src appropriately

       

    9. Save the job form
  4. Logged into SAS Visual Analytics, do the following:
    1. Create a new empty report
    2. With the report opened, hit Ctrl+Alt+B to bring the SAS Visual Analytics Diagnostics window.
    3. With the BIRD tab selected across the top (that’s the default), replace the BIRD XML content with the content of the file github/samples/IntegrationWithSASJobs/4.ParetoUseCase/VA-DDC-Job Pareto use case.xml
    4. Hit Load (this will close the diagnostics window)
    5. With the DDC object at the top right of the report selected, go to the Options pane on the right and fix the URL entry: replace your.host.name accordingly and make sure the path to the job is correct (same value entered in #3a)
    6. Make changes to the values assigned to the parameter _job_name if necessary, so it matches with your environment (same value entered in #3a)
    7. Save the report

Note: (1) Starting with release 2023.06, the examples that used to inherit jQuery from their parent (the SAS Visual Analytics web application) no longer work, so we have provided replacement codes. These replacement files have the same names as their original, but they end with .v4. Because of that, some references to line numbers may not exactly match on those .v4 files.

 

We could have exported all the content as a package, but this would require special privileges in order to import it. Sharing the example as standalone files will give you the opportunity to better explore the SAS Job Execution Web application, familiarize yourself with the content, and understand how they are connected.

 

Note: The first time a user opens this report it fails because the CASUSER table doesn’t exist yet. Reopening the report will work, as well as any subsequent access to the report. This will happen to any other report that depends on a table that is being dynamically loaded in memory when the report opens.

 

 

Next Steps

 

In the next article you will be exploring another use case: detect and remove outliers from time series to prevent Y axis scale to be too high and compromise the clarity of the information displayed in the chart.

 

 

References

 

 

 

Learn More…

 

 

 

 

 

Comments

@Renato_sas that's absolutely fantastic!

I followed your well described steps and it works. 

If I hadn't dedicated so much time to VA and tried so desperately to solve the task at hand with job execution solely, it would have been somehow intimidating. 

 

I'm super happy. Now I can apply it to my needs.

 

Once again thank you.

Hi @acordes,

You are very welcome! I'm glad to know that you've found it helpful.

Version history
Last update:
‎06-28-2023 06:14 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags