BookmarkSubscribeRSS Feed

Programming with CASL in SAS Viya for Learners – basic data manipulation

Started ‎09-12-2022 by
Modified ‎09-09-2022 by
Views 1,315

In the first article of this series on CASL programming in SAS Viya for Learners we discovered how to use pexels-markus-spiske-360591.jpgSAS Studio to make a connection with the CAS server, assign libraries to explore the sample data, perform a simple analysis and terminate the CAS session.

 

In this article we will be looking at basic data manipulation of the CAS tables. It is important to mention here that in CASL, as in all other languages, there is often more than one way to perform a task. It isn’t possible to cover all possible methods in these articles so we will explore what I believe to be the simplest and clearest way to achieve our goals. In time, as you practice, read the documentation and learn more you will discover other techniques which may well, in the right circumstances, be improvements to those you will see here. However, these methods will work and will get you up and running with CASL programming.

 

Starting up

 

Firstly, we will start our session and set up our libraries.

 

/* Connect to the cas server and call the connection casconn*/

cas casconn;

/* Assign SAS librefs for all existing caslibs */

caslib _all_ assign;

 

The data set we will use for this article is called “Accepts” and is held in the ACADEMICS  library. One important thing to note is that in SAS Viya for Learners we don’t have write access to that library. This is, actually, a good thing as it is a global library used by others and in any case it is not generally good practice to alter the original copy of your source data. We do, however, have write access to the CASUSER library. We can see that with the caslibinfo action in the tables action set.

 

/* Get information about caslibs */

proc cas;
	table.caslibinfo caslib="academic";
	run;
	table.caslibinfo caslib="casuser";
	run;
quit;

ScShot1.png

 

You can see from the results that in contrast to the ACADEMICS library CASUSER is both personal and transient. This means that only you can access that data and any data held in it only has session scope i.e. it only exists for the lifetime of the session so you must rerun your analysis to regenerate it if you terminate and restart your session. Again, this is not a bad thing as it prevents mistakes if your data changes or you alter any of your code.

 

Copy a data set

 

Unfortunately, there is no copy action currently available, but we can utilise a little trick to mimic one. If we execute an action which has a casout parameter (not all do hence the need to sometimes make a copy) but don’t specify the parameter which alters the table we can simply make a direct copy.  It is easier to understand this with our example.

 

/* Make a copy of the table */

proc cas;
	table.index
		table={caslib="academic", name="accepts"}
		casout={caslib="casuser", name="accepts_1", replace=true};
quit;

 

Here we have used the index action but haven’t used the indexVars parameter to specify variables to create indexes for. Therefore, a straight copy will be written to CASUSER.

 

Dropping and keeping columns

 

If we only want to use a small number of the columns in a table, we can use the alterTable action with either the drop or keep parameter. As the names imply drop doesn’t copy the columns specified to the output table whereas keep only copies the specified columns. Which one you choose will largely depend on how many columns you want to drop or keep. It is simply easier to use the parameter which requires you to specify fewest columns. In our case we only want to keep three columns, so we’ll use keep. This can be confirmed with the columnInfo action.

 

/* Alter the table keeping only 3 columns */

proc cas;
	table.altertable
		caslib="casuser",		
		name="accepts_1",
		keep={"app_id", "down_pyt", "loan_amt"};
quit;


/* Get information about the columns in the table */

proc cas;
	table.columninfo
		table={caslib="casuser", name="accepts_1"};
quit;

ScShot2.png

 

Creating a Calculated Column

 

We may wish to create a new column in the table, possibly based on values from other columns. We can do this by creating what is known as a calculated column. We can do this by using the table action parameter to an action, along with the table actions “subparameters”  computedVars (which names the new columns) and computedVarsProgram (which specifies the formulae for creating them). Again, this is probably clearer in our example.

 

/* Create a calculated column */

proc cas;
	table.index
		table={
			caslib="casuser"
			name="accepts_1" 
			computedVars={{name="perc"} }
			computedVarsProgram="perc=(down_pyt/loan_amt)*100;"
          }
		casout={caslib="casuser", name="accepts_2", replace=true};
quit;

/* Get information about the columns in the table */

proc cas;
	table.columninfo
		table={caslib="casuser", name="accepts_2"};
quit;

ScShot3.png

 

Subsetting a table

 

This is quite simple – we merely use a where statement in the table parameter when we copy our table. Here we only want to keep rows where the value of the perc column is greater than zero.

 

proc cas;
	table.index
		table={caslib="casuser", name="accepts_2" where="perc>0"}
		casout={caslib="casuser", name="accepts_3", replace=true};
quit;

 

Updating values in a table

 

If we want to alter values in the table, we can do so by using the set parameter in the update action.  Here we want to multiply all the values in the app_id column by 10.

 

proc cas;
	table.update
		table={caslib="casuser", name="accepts_3"}
		set={{var="app_id", value="app_id*10"}};
quit;

 

Deleting rows from a table

 

If we want to delete rows from a table we can use the deleteRows action specifying the condition to use in deciding whether to delete the rows. Here (after first creating a copy of the table as shown earlier) we delete all rows where the value of loan_amt is less than twenty thousand.

 

proc cas;
	table.index
		table={caslib="casuser", name="accepts_3"}
		casout={caslib="casuser", name="accepts_4", replace=true};
quit;

proc cas;
	table.deleterows
		table={caslib="casuser", name="accepts_4", where="loan_amt<20000"};
		
quit;

 

Finally we terminate our session:

 

/* Terminate the session */

cas casconn terminate;

 

I would encourage you to try out some of the techniques shown in this article and if you have any questions or comments, please leave a message below.

Comments
 

 how can i delete rows that are present in another dataset?

Hi @dipand and thanks for the question. You can use the same deleteRows action to remove rows from any CAS table that you have write access to. It's important to remember though that a CAS table is an in-memory structure and you are NOT deleting rows from the underlying data set (or any other file used to create the table).

 

I hope this answers your question - if not please let me know, preferably with a cncrete example.

Version history
Last update:
‎09-09-2022 08:39 AM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags