BookmarkSubscribeRSS Feed

CAS answers to 4 common data manipulation tasks – Part 1 – APPEND

Started ‎07-09-2019 by
Modified ‎07-09-2019 by
Views 7,356

When you design a SAS job to prepare data for analysis, you often use the same common data manipulation techniques to organize, optimize, customize, tweak your data before moving to the next steps. And you have been doing that for years, regardless of the SAS version, regardless of the storage layer, …

 

So now, with CAS, is that different? No, it’s not. Or maybe a little.

 

Let’s have a look at some common data manipulation techniques (append, sort, de-duplicate and aggregate) that require some attention when done in a CAS context. We have seen some mistakes in recent SAS Viya implementations. In this first part, we’ll focus on appending data.

Appending data in CAS

Appending a secondary data set to a primary data set means adding observations from the secondary data set to the end of the primary data set without reading the primary data set (which can be big). This is what makes APPEND more efficient than concatenating 2 data sets and storing the results into a brand new data set.

 

When we think about appending data in the SAS world, we first think about PROC APPEND. However, PROC APPEND does not work in the CAS world. You’ll get an error that the CAS table does not support update access (which is not exactly true as you will see later).

 

 73         proc append base=casdm.bigprdsale data=casdm.smallprdsale ;
 ERROR: Update access is not supported for file CASDM.BIGPRDSALE.DATA.
 74         run ;

 

Forget about PROC APPEND. What are the solutions then?

Before SAS Viya 3.4

Appending data to a CAS table was initially possible when the records to be added were coming from the client (client-side append😞

 

proc casutil ;
   load data=work.smallprdsale casout="bigprdsale" outcaslib="DM" append ;
run ;
quit ;

 

Here, in PROC CASUTIL, you can use the append option only with the load data statement (and not the load casdata statement). The table work.smallprdsale is on the SPRE machine (or SAS machine if you are using SAS 9.4).

 

If you wanted to append a CAS table to another CAS table, you had to first download the CAS table on the SPRE machine.

 

The second option was to use a DATA Step with the APPEND data set option:

 

data casdm.bigprdsale(append=yes) ;
   set casdm.smallprdsale ;
run ;

 

But, although it looked “CAS-enabled” when both the “data” and “set” tables were CAS tables, it was still a client-side append. The “set” table was first downloaded in SPRE (or SAS) before moving back in CAS to be added to the “data” table, with some size limitations. If the “set” table was bigger than 100MB, the user had to adjust the DATALIMIT option.

 

Both options were fine when the data set to be added was small. When it started to be big (gigabytes sized tables) then some performance issues started to arise.

 

A third option was to concatenate the 2 input tables to create a brand new table (and possibly replace one of the 2 input tables):

 

data casdm.bigprdsale ;
   set casdm.bigprdsale casdm.smallprdsale ;
run ;

 

As mentioned earlier, this is not as efficient as a true APPEND because the “bigprdsale” table is read entirely, but it runs in CAS without any data movement between SPRE and CAS. Also, notice that it doesn’t work with global tables. If “bigprdsale” is global, you have to:

  • Give a temporary name to the output table in this DATA Step
  • Drop the old table once the data step is done (dropTable CAS action)
  • Rename the new table with the original name (alterTable CAS action)

Now in SAS Viya 3.4

The DATA Step with the APPEND data set option has been extended to support “server-sideAPPEND. So, this is the only way to go to append a CAS table to another CAS table efficiently without triggering any data movement. It works even if the target table is global. The restriction is to have the exact same table structure.

 

data casdm.bigprdsale(append=yes) ;
   set casdm.smallprdsale ;
run ;

 

No more question about appending data in CAS. The best practice is to use the DATA Step with the APPEND data set option.

Takeaways

  • PROC APPEND does not work on CAS data
  • Best practice: use the DATA Step with the APPEND data set option
  • Mistake to avoid: download the “secondary” CAS table on the client (SPRE or SAS) before appending it to the “primary” CAS table using a client-side technique

Thanks for reading and stay tune for next article on sorting data.

Comments

Does it look so much easier in SAS Viya 3.4? Right?

The syntax looks the same in both version (data step with the append data set option). But it is more efficient in 3.4. The append happens on the server side.

How would you append in cas if the base dataset does not exist (the way proc append does in 9.4)?

This doesn't seem to be a problem. If the base dataset does not exist in CAS at the first time it creates it with the "data" dataset.

 

Informative. Quick question 

data casdm.bigprdsale(append=yes) ;
   set casdm.smallprdsale ;
run ;

this above code appends the data which is in-memory. How can I append the data in Physical table?

 

Regards,

Swapna

Hi Swapna,

That's right, this appends the data in CAS.

There is no way to append data to a physical table directly from CAS. This should be done using SAS traditional capabilities on the SAS Programming Runtime Environment side.

Best regards,

Nicolas.

Thank you Nicolas for quick response. This made me double sure, but it is really sad that this does not work with physical table as this makes things more difficult with the limited options I have for now. I need to look for other options then.. 🙂

Thanks again.

Regards,

Swapna 

@SwapnaSingh I believe you can append to the table in memory and then just save the in-memory table to a physical location. If you want to append to a physical table, load the physical table into memory, append to the in-memory table, save the in-memory table as a physical one replacing the original.

 

This can be achieved using proc casutil with the droptable, load and save statements.

@armte  Thank you for your response, what you mentioned is exactly what I am finally doing, to achieve what I mentioned in my initial email.

@SwapnaSingh No problem. I recently solved a similar problem so thought I would share my solution in case you hadn't solved your issue yet. I'm glad you were able to solve your issue 🙂

Hi, 

how can I solve it if the cas table to be appended has one more field than the cas table of the historicized data?

Thanks

Hi @ElisabettaC 

Well, you would have to remove that variable (or at least filter it out in the set statement) before appending the new records table to the main table.

Version history
Last update:
‎07-09-2019 04:49 AM
Updated by:
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags