This post is another in a series of posts leading up to SAS Global Forum 2019. My colleagues, Elliot Inman and Ryan West, and I wrote a paper titled Kustomizing Your SAS Viya Engine Using SAS Studio Custom Tasks and D3.js. Custom Task Tuesday readers will get to preview the tasks associated with the paper, before the paper comes out! Once the paper does come out, I will add a link to it here.
This is the second task and article related to our paper. Check out the post for the first task if you're interested.
In our paper, we used data collected by the United States Environment Protection Agency (EPA) about cars. The EPA regularly tests new vehicles for fuel efficiency and emissions. The data can be downloaded as a CSV from FuelEconomy.gov. The data include vehicle make / model / year and detailed miles-per-gallon (MPG) and emissions test results. Check out the data dictionary here for more details.
The SAS data set we used in our analysis is available for download on the Task Tuesday GitHub. It includes only the variables that we used and has variables labels as well.
There are several built-in tasks in SAS Studio that each run a different supervised or unsupervised learning model. For example, you can open the SAS Viya Unsupervised Learning task “Clustering” and select your dataset and input variables and run that model. Then, you can open the “Decision Tree” task, select your data set and input variables and run that model. Then, you can open the “Forest” task and… you get the idea.
If you have a set of analytic processes that you want to run over and over on the same dataset, you can combine them all into one task and only make those selections once. This week’s task is essentially an example of an all-in-one analytic modeling task. The example will show results for the cars data from the EPA, but any data could be used.
Here’s what the task looks like:
This task goes through the process of running a clustering analysis, running a decision tree to get the variable importance, and running a forest model to look at how well the cluster ID predicted a certain metric. This process is repeated iteratively, increasing the maximum number of clusters each time. The complicated part of the task is actually the SAS code, while the task itself is quite simple (no dependencies, just role selectors, numsteppers, text boxes, and a check box).
There are 6 datasets that result from running this task:
The part of this task that I want to highlight that will be useful for task authors is the “Promote to VA” checkbox. This writes all of the output data structures to the PUBLIC caslib, which will make them available for use in Visual Analytics reports. The “promote=yes” option is added so that the table will persist beyond the current CAS session. For a deeper explanation of CAS table promotion, see this paper by Mike Drutar: Just Enough SAS® Cloud Analytic Services: CAS Actions for SAS® Visual Analytics Report Developers.
The Metadata code for the VA checkbox is here:
<Option name="GROUPVA" inputType="string">VISUAL ANALYTICS PROMOTION</Option>
<Option name="labelVA" inputType="string">Promoting the output data sets to the PUBLIC
caslib will make them available for use in Visual Analytics. </Option>
<Option name="chkVA" defaultValue="0" inputType="checkbox">Promote to VA</Option>
The UI code for the VA checkbox is here:
<Group option="GROUPVA" open="true">
<OptionItem option="labelVA"/>
<OptionItem option="chkVA"/>
</Group>
And finally, the Code Template portion for the VA checkbox is here:
#if ($chkVA == 1)
proc datasets lib=public; delete clusterwide; run;
data public.clusterwide (promote=yes);
set casuser.clusters;
run;
proc datasets lib=public; delete sample; run;
data public.sample (promote=yes);
set casuser.sample;
run;
proc datasets lib=public; delete clustertall; run;
data public.clustertall (promote=yes);
set casuser.clustertall;
run;
proc datasets lib=public; delete varimportance; run;
data public.varimportance (promote=yes);
set casuser.varimportance;
run;
proc datasets lib=public; delete forestclus; run;
data public.forestclus (promote=yes);
set casuser.forestclus;
run;
proc datasets lib=public; delete forestvars; run;
data public.forestvars (promote=yes);
set casuser.forestvars;
run;
#end
Download the task from the Custom Task Tuesday GitHub to view all of the code. Can any of your tasks make use of the “Promote to VA” checkbox?
Use the hashtag #CustomTaskTuesday and tweet @OliviaJWright with your Custom Task comments and questions!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.