Building on my "CAS is fast" post, today, we'll compare match code generation between foundation SAS (a.k.a the Compute Server) and CAS and we'll be using data of varying size in response to concerns that "CAS is only for big data." On each row, we'll generate a match code for a customer name field at 95% sensitivity.
From the results below, we see that CAS does indeed process the data more slowly at extremely small volumes (hundreds and thousands of rows) but quickly overtakes Compute around volumes of tens of thousands of records.
Engine | Rows | Size (mb) | Match Code Generation Time |
---|---|---|---|
Compute (Foundation SAS) | 599 | 0.256 | 0.31s |
CAS | 599 | 0.153 | 10.63s |
Compute | 5990 | 1.5 | 3.02s |
CAS | 5990 | 1.5 | 10.99s |
Compute | 59900 | 14 | 25.85s |
CAS | 59900 | 15 | 11.85s |
Compute | 599000 | 140 | 4:04m |
CAS | 599000 | 153 | 27.14s |
Compute | 5990000 | 1392 | 50:25m |
CAS | 5990000 | 1533 (238 DVR) | 1:55m |
.
From this graphical representation, we can also see that performance degrades more abruptly for Compute than CAS as data volumes increase.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
At extremely small "PoC" data volumes Foundation SAS (Compute) outperforms CAS on match code generation. However, at more realistic volumes CAS outperforms Compute by orders of magnitude. .
The environment contains the latest LTS Viya software deployed with 3 CAS worker nodes and one controller each with 4 virtual cores and 32MB of RAM.
Project Code -- Not Production Quality; Not Meant to Run Without Intervention
CAS mySession SESSOPTS=(metrics=true);
CASLIB _all_ assign;
/* Prepare the compute server test */
data sasdm.customer_list;
length id 8 name $32 address $50 'zip code'n $10 phone $10 city $32 country $32 notes $50 sid 8;
set saspgdvd.customer_list;
run;
data sasdm.customer_list10 ;
set sasdm.customer_list sasdm.customer_list sasdm.customer_list sasdm.customer_list sasdm.customer_list
sasdm.customer_list sasdm.customer_list sasdm.customer_list sasdm.customer_list sasdm.customer_list;
run;
data sasdm.customer_list100;
set sasdm.customer_list10 sasdm.customer_list10 sasdm.customer_list10 sasdm.customer_list10 sasdm.customer_list10
sasdm.customer_list10 sasdm.customer_list10 sasdm.customer_list10 sasdm.customer_list10 sasdm.customer_list10;
run;
data sasdm.customer_list1000;
set sasdm.customer_list100 sasdm.customer_list100 sasdm.customer_list100 sasdm.customer_list100 sasdm.customer_list100
sasdm.customer_list100 sasdm.customer_list100 sasdm.customer_list100 sasdm.customer_list100 sasdm.customer_list100;
run;
data sasdm.customer_list10000;
set sasdm.customer_list1000 sasdm.customer_list1000 sasdm.customer_list1000 sasdm.customer_list1000 sasdm.customer_list1000
sasdm.customer_list1000 sasdm.customer_list1000 sasdm.customer_list1000 sasdm.customer_list1000 sasdm.customer_list1000;
run;
proc contents data=sasdm.customer_list;run;
proc contents data=sasdm.customer_list10;run;
proc contents data=sasdm.customer_list100;run;
proc contents data=sasdm.customer_list1000;run;
proc contents data=sasdm.customer_list10000;run;
/*Prepare the CAS test*/
data dm_pgdvd.customer_list (copies=0 promote=yes);
set sasdm.customer_list;
run;
data dm_pgdvd.customer_list10 (copies=0 promote=yes);
set sasdm.customer_list10;
run;
data dm_pgdvd.customer_list100 (copies=0 promote=yes);
set sasdm.customer_list100;
run;
data dm_pgdvd.customer_list1000 (copies=0 promote=yes);
set sasdm.customer_list1000;
run;
data dm_pgdvd.customer_list10000 (copies=0 promote=yes);
set sasdm.customer_list10000;
run;
proc cas;
table.copyTable /
table={name="customer_list10000" caslib="dm_pgdvd"}
casOut={name="customer_list10000dvr" caslib="dm_pgdvd" memoryFormat="DVR" replace=True replication=0};
run;
/* Table Stats */
proc cas;
table.fileInfo / caslib="dm" ;
quit ;
proc cas;
table.tableInfo / caslib="dm_pgdvd" name="customer_list" ;
table.tableInfo / caslib="dm_pgdvd" name="customer_list10" ;
table.tableInfo / caslib="dm_pgdvd" name="customer_list100" ;
table.tableInfo / caslib="dm_pgdvd" name="customer_list1000" ;
table.tableInfo / caslib="dm_pgdvd" name="customer_list10000" ;
table.tableInfo / caslib="dm_pgdvd" name="customer_list10000dvr" ;
quit ;
proc cas;
table.columnInfo / table={caslib="dm_pgdvd" name="customer_list"} ;
table.columnInfo / table={caslib="dm_pgdvd" name="customer_list10"} ;
table.columnInfo / table={caslib="dm_pgdvd" name="customer_list100"} ;
table.columnInfo / table={caslib="dm_pgdvd" name="customer_list1000"} ;
table.columnInfo / table={caslib="dm_pgdvd" name="customer_list10000"} ;
table.columnInfo / table={caslib="dm_pgdvd" name="customer_list10000dvr"} ;
quit ;
proc cas;
table.tabledetails / caslib="dm_pgdvd" name="customer_list" level="SUM";
table.tabledetails / caslib="dm_pgdvd" name="customer_list10" level="SUM";
table.tabledetails / caslib="dm_pgdvd" name="customer_list100" level="SUM";
table.tabledetails / caslib="dm_pgdvd" name="customer_list1000" level="SUM";
table.tabledetails / caslib="dm_pgdvd" name="customer_list10000" level="SUM";
table.tabledetails / caslib="dm_pgdvd" name="customer_list10000dvr" level="SUM";
quit ;
/* Load the DQ locale */
%DQLOAD(DQLOCALE=(ENUSA), DQSETUPLOC='/opt/sas/viya/home/share/refdata/qkb/QKB CI 32/qkb-ci-32.1.3-qkb-viya.qarc');
/* Run the compute server test */
data sasdm.customerMatchCode;
length mcName $100;
set sasdm.customer_list;
mcName=dqMatch(name,'NAME',95,'ENUSA') ;
run;
data sasdm.customerMatchCode;
length mcName $100;
set sasdm.customer_list10;
mcName=dqMatch(name,'NAME',95,'ENUSA') ;
run;
data sasdm.customerMatchCode;
length mcName $100;
set sasdm.customer_list100;
mcName=dqMatch(name,'NAME',95,'ENUSA') ;
run;
data sasdm.customerMatchCode;
length mcName $100;
set sasdm.customer_list1000;
mcName=dqMatch(name,'NAME',95,'ENUSA') ;
run;
data sasdm.customerMatchCode;
length mcName $100;
set sasdm.customer_list10000;
mcName=dqMatch(name,'NAME',95,'ENUSA') ;
run;
/* Run the CAS test */
data dm_pgdvd.customerMatchCode ;
length mcName $100;
set dm_pgdvd.customer_list;
mcName=dqMatch(name,'NAME',95,'ENUSA') ;
run;
data dm_pgdvd.customerMatchCode ;
length mcName $100;
set dm_pgdvd.customer_list10;
mcName=dqMatch(name,'NAME',95,'ENUSA') ;
run;
data dm_pgdvd.customerMatchCode ;
length mcName $100;
set dm_pgdvd.customer_list100;
mcName=dqMatch(name,'NAME',95,'ENUSA') ;
run;
data dm_pgdvd.customerMatchCode ;
length mcName $100;
set dm_pgdvd.customer_list1000;
mcName=dqMatch(name,'NAME',95,'ENUSA') ;
run;
data dm_pgdvd.customerMatchCode ;
length mcName $100;
set dm_pgdvd.customer_list10000;
mcName=dqMatch(name,'NAME',95,'ENUSA') ;
run;
Test Log -- Compute Running on 5.9M Rows
79 data sasdm.customerMatchCode;
80 length mcName $100;
81 set sasdm.customer_list10000;
82 mcName=dqMatch(name,'NAME',95,'ENUSA') ;
83 run;
NOTE: There were 5990000 observations read from the data set SASDM.CUSTOMER_LIST10000.
NOTE: The data set SASDM.CUSTOMERMATCHCODE has 5990000 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 50:25.07
cpu time 50:34.12
Test Log -- CAS Running on 5.9M Rows
79 data dm_pgdvd.customerMatchCode ;
80 length mcName $100;
81 set dm_pgdvd.customer_list10000;
NOTE: Executing action 'table.tableInfo'.
NOTE: Action 'table.tableInfo' used (Total process time):
NOTE: real time 0.016233 seconds
NOTE: cpu time 0.017619 seconds (108.54%)
NOTE: total nodes 4 (32 cores)
NOTE: total memory 251.04G
NOTE: memory 1.58M (0.00%)
NOTE: Executing action 'table.tableInfo'.
NOTE: Action 'table.tableInfo' used (Total process time):
NOTE: real time 0.010916 seconds
NOTE: cpu time 0.013365 seconds (122.43%)
NOTE: total nodes 4 (32 cores)
NOTE: total memory 251.04G
NOTE: memory 1.58M (0.00%)
NOTE: Executing action 'table.columnInfo'.
NOTE: Action 'table.columnInfo' used (Total process time):
NOTE: real time 0.031052 seconds
NOTE: cpu time 0.026757 seconds (86.17%)
NOTE: total nodes 4 (32 cores)
NOTE: total memory 251.04G
NOTE: memory 3.41M (0.00%)
82 mcName=dqMatch(name,'NAME',95,'ENUSA') ;
83 run;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: Executing action 'sessionProp.getSessOpt'.
NOTE: Action 'sessionProp.getSessOpt' used (Total process time):
NOTE: real time 0.018381 seconds
NOTE: cpu time 0.014225 seconds (77.39%)
NOTE: total nodes 4 (32 cores)
NOTE: total memory 251.04G
NOTE: memory 851.53K (0.00%)
NOTE: Executing action 'sessionProp.setSessOpt'.
NOTE: Action 'sessionProp.setSessOpt' used (Total process time):
NOTE: real time 0.012184 seconds
NOTE: cpu time 0.015754 seconds (129.30%)
NOTE: total nodes 4 (32 cores)
NOTE: total memory 251.04G
NOTE: memory 1.07M (0.00%)
NOTE: The DATA step will run in multiple threads.
NOTE: Executing action 'dataStep.runBinary'.
NOTE: There were 5990000 observations read from the table CUSTOMER_LIST10000 in caslib DM_PGDVD.
NOTE: The table customerMatchCode in caslib DM_PGDVD has 5990000 observations and 10 variables.
NOTE: Action 'dataStep.runBinary' used (Total process time):
NOTE: real time 114.835489 seconds
NOTE: cpu time 2115.218305 seconds (1841.96%)
NOTE: data movement time 0.055814 seconds
NOTE: total nodes 4 (32 cores)
NOTE: total memory 251.04G
NOTE: memory 4.04G (1.61%)
NOTE: bytes moved 2.01G
NOTE: Executing action 'sessionProp.setSessOpt'.
NOTE: Action 'sessionProp.setSessOpt' used (Total process time):
NOTE: real time 0.011197 seconds
NOTE: cpu time 0.014895 seconds (133.03%)
NOTE: total nodes 4 (32 cores)
NOTE: total memory 251.04G
NOTE: memory 1.07M (0.00%)
NOTE: DATA statement used (Total process time):
real time 1:55.11
cpu time 0.53 seconds
Notes:
Find more articles from SAS Global Enablement and Learning here.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.