We’re smarter together. Learn from this collection of community knowledge and add your expertise.

SAS Studio Custom Task Tuesday: How to Quickly Reduce Data Size

by SAS Employee OliviaWright on ‎02-15-2017 10:18 AM - edited on ‎03-24-2017 05:14 PM by Community Manager (1,936 Views)

Custom Task Tuesday.jpgHave you ever wanted to quickly reduce clutter from a dataset that is far too large? Want to do things like delete unwanted variables and remove duplicate observations with the click of a button? Well, this Custom Task lets you do just that.


The Data Size Reduction task that we will be creating this week allows the user to delete variables from their dataset as well as delete duplicate records (across all variables, or by chosen by-variable) and outputs a new dataset with their specified name. This will allow for easy clean-up or simplification of large datasets.


Here's what the finished product of the task will look like:


data reduction.PNG


Step 1: Getting Startednew task.png


In SAS Studio, under the Task and Utilities section, open a “New Task” as well as the “Sample Task.” We will copy and paste the necessary Velocity Template code from the Sample Task to our task.


Step 2: Naming and Saving the Data Reduction Task


Name: Data Reduction 

Description: Allows user to reduce size of their input dataset by optionally dropping variables and removing duplicate records.


At the top of the VTL code for your New Task, you will need to fill in the Name and Description portions as shown below:


After you’ve done that, you should save this task to your My Tasks folder, so you don’t lose it. Click theedity.pngbutton in the upper left corner of the task to bring up this option screen:



Step 3: Creating the Metadata Portion of the Data Reduction Task


The objects we will need to create this task are: 1 dataset selector, 1 textbox, 2 role selectors, and a checkbox (the checkbox is only shown if a by variable is specified). Copy and paste the code for these objects from the Sample Task to the new task you just created. Edit the code and labels to correspond with the finished product. 

This is what our finished Metadata portion should look like:


            <DataSource name="DATASOURCE">
                     <Role maxVars="10" minVars="0" name="DROPVAR" order="true" type="A">Select variables to drop:</Role>
                     <Role maxVars="1" minVars="1" name="BYVAR" order="true" type="A">Select a by variable:</Role>
            <Option inputType="string" name="DATATAB">DATA</Option>
            <Option inputType="string" name="DATAGROUP">DATASET</Option>
            <Option inputType="string" name="ROLESGROUP1">DROP VARIABLES</Option>
            <Option inputType="string" name="labelTEXT2">Variables will be dropped after all other operations (dropping duplicate records) are complete.</Option>
            <Option inputType="string" name="ROLESGROUP2">DELETE DUPLICATE RECORDS</Option>
            <Option inputType="string" name="GROUPCHECK">OPTIONS</Option>
            <Option inputType="string" name="labelCHECK">Select additional data reduction options.</Option>
            <Option defaultValue="0" inputType="checkbox" name="chkNODUP">Delete Duplicate Records</Option>
            <Option defaultValue="0" inputType="checkbox" name="chkNODUPKEY">Delete Duplicate Records by Chosen By-Variable</Option>
            <Option inputType="string" name="GROUPTEXT">OUTPUT DATASET</Option>
            <Option inputType="string" name="labelTEXT">Enter a name for your new output dataset.</Option>
            <Option defaultValue="Dataset_Reduced" indent="1" inputType="inputtext" missingMessage="Missing dataset name." name="textDATASET" promptMessage="Enter a valid dataset name." required="true">Output Dataset:</Option>


Step 4: Creating the UI Portion of the Data Reduction Task


For object in the metadata portion, we need corresponding code in the UI portion. Copy and paste the code for the UI portion of the objects we need from the Sample Task to the new task you've just created. Edit the code and labels to correspond with the finished product. 


This is what our finished UI portion should look like:


    <Container option="DATATAB">
<Group open="true" option="DATAGROUP"> <DataItem data="DATASOURCE"/> </Group> <Group open="true" option="GROUPTEXT"> <OptionItem option="labelTEXT"/> <OptionItem option="textDATASET"/> </Group> <Group open="true" option="ROLESGROUP2"> <RoleItem role="BYVAR"/> <OptionItem option="chkNODUP"/> <OptionItem option="chkNODUPKEY"/> </Group> <Group open="true" option="ROLESGROUP1"> <OptionItem option="labelTEXT2"/> <RoleItem role="DROPVAR"/> </Group> </Container> </UI>


Step 5: Creating the Dependencies Portion of the Data Reduction Task


This is the first task in the Custom Task Tuesday series to utilize the Dependencies portion. If you find it difficult to understand, know that you don't need it in order to start writing your own tasks. However, once you get the hang of it this is a very useful too. The Dependencies portion of the VTL code lets you hide/show and disable/enable objects based on the user's selections meeting certain conditions. 


For our task, we do not want the checkboxes to delete duplicate records to show up until the user has specified a by-variable. We will "hide" our checkboxes when the by-variable selector is empty, and "show our" checkboxes when the by-variable selector contains a variable. 


This is what our finished Dependencies portion will look like:


<Dependency condition = "$BYVAR.size() == 0">
<Target conditionResult = "true" option = "chkNODUPKEY" action = "hide"/>
<Dependency condition = "$BYVAR.size() != 0">
<Target conditionResult = "true" option = "chkNODUPKEY" action = "show"/>
<Dependency condition = "$BYVAR.size() == 0">
<Target conditionResult = "true" option = "chkNODUP" action = "hide"/>
<Dependency condition = "$BYVAR.size() != 0">
<Target conditionResult = "true" option = "chkNODUP" action = "show"/>


Step 6: Creating the Code Template Portion of the Data Reduction Task


The Code Template portion contains your SAS Code. Velocity Template Language has its own macro variables, and each of our UI elements has one. This is how you make sure that your code works with what the user selects in the interface you created. When you select an option in the UI, the value of the VTL macro variable will change immediately. 


This is what our finished SAS Code portion will look like:


option spool;

data $textDATASET;

%macro options;
%if $chkNODUP = 1 && $chkNODUPKEY = 1 %then %do;
	#if( $BYVAR.size() > 0 ) 
		#foreach( $item in $BYVAR )
	proc sort data=$textDATASET nodupkey noduprecs;
			by $item;
%if $chkNODUPKEY = 1 && $chkNODUP = 0 %then %do;
	#if( $BYVAR.size() > 0 ) 
		#foreach( $item in $BYVAR )
	proc sort data=$textDATASET nodupkey;
			by $item;
%if $chkNODUPKEY = 0 && $chkNODUP = 1 %then %do;
	#if( $BYVAR.size() > 0 ) 
		#foreach( $item in $BYVAR )
	proc sort data=$textDATASET nodup;
			by $item;
%mend options;

data $textDATASET;
	set $textDATASET;
#if( $DROPVAR.size() > 0 ) 
	#foreach( $item in $DROPVAR )
		drop $item;

proc print data=$textDATASET;


Step 6: Run the Data Reduction Task

We've just created a custom user interface to create a state map containing points specified from a user’s dataset! Click thesave.pngbutton to save, then click the run.pngbutton to open the task. Make your selections, then click run.pngagain to watch it run!


Want to try it yourself?

Get the code from the zip file at the end of this article or from GitHub.

Take Me to GitHub!