About pechin

pechin

The SAS Communities Blog Efficiency Matters: Using Indexes in SAS provides an excellent review of what indexes are, how to create them, and how they can improve SAS query performance. Its summary captures the essence well: “Indexes are a powerful tool for improving SAS performance, especially when working with large data. By creating simple or composite indexes, you allow SAS to locate observations efficiently without scanning the entire data set. Whether you are filtering with a WHERE statement, joining tables, or performing BY- group processing, indexes can reduce run time dramatically.” Since many SAS users query large datasets in databases such as Microsoft SQL Server via SAS/ACCESS interfaces, I wanted to validate whether similar performance gains could be expected when indexes exist on database tables. Using scenarios similar to those in the SAS Communities blog, I found that querying SQL Server tables via SAS/ACCESS was significantly faster when indexes were in place: Filtering with a WHERE statement: 106× faster Joining tables with a WHERE statement: 19× faster BY-group processing: 6× faster This blog outlines the validation steps taken to arrive at these findings. _______________________________________________________________________________________________________________________ Test Specifications To test the three scenarios, I created duplicate tables in Microsoft SQL Server—one set with indexes and one without. I downloaded data from the Efficiency Matters blog on GitHub: link Employee_Master: 308 rows Orders: 10,000 rows replicated 1,000 times for a total of 10 million rows For testing, I chose non-clustered indexes because they can be created on single or multiple columns, don’t require sorting, and allow multiple indexes per table. By contrast, a clustered index forces rows to be stored in the order of the index key, limiting each table to one clustered index. Creating Indexes via SQL Passthrough proc sql; connect using MSSQLSVR /*previously defined LIBNAME*/ as SQLDB; EXECUTE(CREATE NONCLUSTERED INDEX SPINDEX ON orders_10m_indx (State_Province)) BY SQLDB; EXECUTE(CREATE NONCLUSTERED INDEX CUSTINDEX ON orders_10m_indx (Customer_ID)) BY SQLDB; EXECUTE(CREATE NONCLUSTERED INDEX EMPINDEX ON orders_10m_indx (Employee_ID)) BY SQLDB; EXECUTE(CREATE NONCLUSTERED INDEX EMPINDEXO ON emp_master_indx (Employee_ID)) BY SQLDB; disconnect from SQLDB; quit; To check or validate that indexes exist proc sql; connect using MSSQLSVR /*previously defined LIBNAME*/ as SQLDB; create table indexed_fields as select * from connection to SQLDB (select t.name as table_name, i.name as index_name, c.name as column_name, i.type_desc as index_type from sys.tables t join sys.indexes i on t.object_id=i.object_id join sys.index_columns ic on i.object_id= ic.object_id and i.index_id=ic.index_id join sys.columns c on ic.object_id=c.object_id and ic.column_id=c.column_id where t.name in ('emp_master_indx','orders_10m_indx')); disconnect from SQLDB; quit; Example output: If you don’t have permission to create indexes, work with your SQL Server administrator. _______________________________________________________________________________________________________________________ Environment Specifications SAS Viya 4 compute server (Version Stable 2025.08) SAS/ACCESS Interface to Microsoft SQL Server _______________________________________________________________________________________________________________________ Example 1: Effect of Index on WHERE Clause data ex_1; set MSSQLSVR.orders_10M; where Customer_ID=10860; run; Results: Produces 1,000 observations & 24 variables Without index: completes in 37.28 seconds With index on Customer_ID: completes in 0.25 seconds → 106× faster _______________________________________________________________________________________________________________________ Example 2: Joining Tables with WHERE Clause proc sql; create table ex10 as select distinct(m.Employee_Name) as Name, count(p.Order_ID) as orders from mssqlsvr.emp_master as m inner join mssqlsvr.orders_10M as p on m.Employee_ID=p.Employee_ID where m.Employee_ID=120178; quit; Results: Produces 1 observation & 2 variables Without index: completes in 37.81 seconds With index on Employee_ID (both tables): completes in 1.95 seconds → 19× faster ______________________________________________________________________________________________________________________ Example 3: BY-Group Processing proc means data=mssqlsvr.orders_10M mean; var quantity; by State_Province; run; Results: Produces report for 228 State_Province values Without index: completes in 38.35 seconds With index on State_Province: completes in 38.92 seconds → No improvement Why no Improvement? Ordering by a non-clustered index isn’t always faster because queries often need extra columns not in the index, forcing expensive lookups. If the index covers the query, performance improves. To test this, I created a new covering index that included both variables in the Proc Means example (State_Province & Quantity). proc sql; connect using MSSQLSVR /*previously defined LIBNAME*/ as SQLDB; EXECUTE(CREATE NONCLUSTERED INDEX COMBO_INDEX ON orders_10m_indx (State_Province, Quantity)) BY SQLDB; disconnect from SQLDB; quit; Results after rerunning Proc Means: Without index: completes in 38.35 seconds With covering index: completes in 6.5 seconds → 6× faster _______________________________________________________________________________________________________________________ Summary Indexes in Microsoft SQL Server deliver substantial performance improvements when accessed through SAS/ACCESS, mirroring the benefits seen in native SAS datasets. WHERE clause filtering can be over 100× faster with appropriate indexes. Joins benefit significantly when both tables are indexed on join keys. BY-group processing benefited from covering indexes that included all the columns necessary to deliver results. For SAS users working with large database tables, collaborating with database administrators to ensure proper indexing can dramatically reduce query times and improve overall efficiency. _______________________________________________________________________________________________________________________ Resources Microsoft SQL Server Clustered and Non Clustered Indexes SAS/ACCESS Interface to Microsoft SQL Server

pechin · ‎12-04-2025

Avoid Surprises in Data Analysis Across Platforms The Challenge A single NULL can change everything. In hybrid workflows, SAS and databases like Snowflake don’t always treat missing values the same way. If you don’t understand where your NULL value is being processed, you risk reports that look correct but aren’t. This blog explores how those differences arise and how to guard against them. _______________________________________________________________________________________________________________________ What You Need to Know Relational Databases: In most DBMS platforms, NULL represents an absence of data. NULL values are excluded from analytics by default. They cannot be sorted or evaluated with standard comparison operators or functions. SAS: When using SAS/ACCESS to read database data, SAS translates database NULL values into missing values. SAS supports 28 types of missing values, which means missing values are included in analytics. They can be sorted and evaluated. Implication: This difference in handling can significantly affect results depending on whether SAS/ACCESS passes queries to the database for processing or reads the data back into SAS for processing. _______________________________________________________________________________________________________________________ An Example The sashelp.heart dataset is derived from the Framingham Heart Study, a long-term cardiovascular study. It contains 5,209 observations (one per patient) and 17 columns, including alive/dead status, age, sex, and clinical indicators such as cholesterol status. The Chol_Status field contains values: High, Borderline, Desirable, and missing. Goal Find the proportion of alive and dead patients where cholesterol status is not High. SAS CODE proc freq data=sashelp.heart; tables Status*Chol_Status; where Chol_Status NE 'High'; run; SAS Output Alive: 2,267 Dead: 1,151 Snowflake Comparison When sashelp.heart is copied to Snowflake via implicit passthrough, SAS missing values are translated to NULL . Running the same query in Snowflake produces different results: PROC FREQ Output SAS Snowflake Difference Status-Alive 2,267 2,184 83 Status-Dead 1,151 1,082 69 Chol_Status-Missing 152 0 152 _______________________________________________________________________________________________________________________ Options for Handling Database NULL Values 1. Modify the WHERE Clause Explicitly include NULL values to align Snowflake results with SAS: proc freq data=snowlib.heart; tables Status*Chol_Status; where Chol_Status in ('Borderline','Desirable') or Chol_Status is NULL; run; 2. Force SAS Processing Use the LIBNAME option DIRECT_SQL=NOWHERE to prevent WHERE clauses from being passed to the database. This forces SAS to process the data locally: libname snowlib snow user=... password=... direct_sql=nowhere; Caution: This approach requires all data to be read back into SAS, which can have performance implications for large datasets. _______________________________________________________________________________________________________________________ How to Validate Where Processing Occurs PROC FREQ is one of the SAS procedures that SAS/ACCESS can push down to the database. To confirm enable tracing options: options sastrace=',,,ds' sastraceloc=saslog nostsuffix sql_ip_trace=(note,source); The log will show the SQL passed to Snowflake. You can also confirm query execution within Snowflake Monitoring - Query Details. _______________________________________________________________________________________________________________________ Summary Differences in NULL handling between SAS and DBMS platforms like Snowflake can lead to subtle but significant discrepancies in results. What looks like a small detail — whether missing values are included or excluded — can ripple into compliance reports, clinical trial outcomes, or financial dashboards. To avoid surprises: Always check where your query is being processed. Explicitly handle NULL values in your WHERE clauses. Use SAS options to validate pushdown behavior. By understanding these differences and planning accordingly, you can ensure your hybrid workflows remain accurate, reproducible, and trustworthy. _______________________________________________________________________________________________________________________ Resources Potential Result Set Differences When Processing Null Data In-Database Processing with SAS/ACCESS SAS/ACCESS for Relational Databases – DBMS Specific Reference NULL Handling in Snowflake

pechin · ‎08-20-2025

The Challenge When Snowflake VARCHAR columns lack defined lengths, SAS assigns them its maximum character size (32,767). This can lead to excessive memory usage, long run times, or even out-of-memory errors. The Solution LIBNAME or Data Set option SCANSTRINGCOLUMNS= YES scans all VARCHAR columns in a Snowflake table to determine the actual maximum length of the columns before loading the data into SAS. That means a 2 digit STATECODE (e.g. NY) with an undefined length in Snowflake will load as 2 characters instead of 32,767 characters. This option can significantly reduce the size of the resulting table and can accelerate the loading process. An Example I created a copy of SASHELP.ZIPCODE in Snowflake and changed all the character columns to unspecified. SASHELP.ZIPCODE has 40938 rows, 21 columns (13 of which are character type), and has a file size = 35 MB. I then copied the Snowflake table back to SAS without and with the SCANSTRINGCOLUMNS=YES option: Without SCANSTRINGCOLUMNS=YES option, the read to SAS took 22 seconds and the file size grew to 16 GB with 13 character columns with length = 32,767. That's a 471% increase in size! With SCANSTRINGCOLUMNS=YES option, the read to SAS took only 4 seconds and the file size actually reduced to 33MB. That's a 99.8% decrease from what it was without the SCANSTRINGCOLUMNS=YES option! Code to test on your environment Use the following code to create the data sets and replicate the example described above. I used SAS/ACCESS® interface to Snowflake and wrote my code in SAS Studio on Viya 4.0. Step 1: Add system options to enhance performance statistics in the log. OPTION SASTRACE=',,,ds' SASTRACELOC=SASLOG NOSTSUFFIX SQL_IP_TRACE=(note, source) msglevel=i FULLSTIMER; Step 2: Create a copy of SASHELP.ZIPCODE in Snowflake with undefined varchar lengths. Data SNOW.ZIP_UNDEFINED (KEEP= ZIP X Y ZIP_CLASS_U CITY_U STATE STATECODE_U STATENAME_U COUNTY COUNTYNM_U MSA AREACODE AREACODES_U TIMEZONE_U GMTOFFSET DST_U PONAME_U ALIAS_CITY_U ALIAS_CITYN_U CITY2_U STATENAME2_U); length ZIP_CLASS_U varchar(*) CITY_U varchar(*) STATECODE_U varchar(*) STATENAME_U varchar(*) COUNTYNM_U varchar(*) AREACODES_U varchar(*) TIMEZONE_U varchar(*) DST_U varchar(*) PONAME_U varchar(*) ALIAS_CITY_U varchar(*) ALIAS_CITYN_U varchar(*) CITY2_U varchar(*) STATENAME2_U varchar(*) ; Set SASHELP.ZIPCODE; ZIP_CLASS_U = ZIP_CLASS; CITY_U = CITY; STATECODE_U = STATECODE; STATENAME_U = STATENAME; COUNTYNM_U = COUNTYNM; AREACODES_U = AREACODES; TIMEZONE_U = TIMEZONE; DST_U = DST; PONAME_U = PONAME; ALIAS_CITY_U = ALIAS_CITY; ALIAS_CITYN_U = ALIAS_CITYN; CITY2_U = CITY2; STATENAME2_U = STATENAME2; Run; Step 3: Copy created Snowflake table back to SAS without SCANSTRINGCOLUMNS=YES. Review log for real time and run PROC CONTENTS on SAS table to view size. data work.sf_zip_1; set SNOW.ZIP_UNDEFINED; run; /*log results - real time 22.25 seconds with bulkunload*/ PROC CONTENTS data=work.sf_zip_1; /* reveals there are 21 variables, 40938 rows. Uncompressed, UTF-8 encoding, 16 GB file size. All Char length variables = 32767*/ Step 4: Copy created Snowflake table back to SAS with SCANSTRINGCOLUMNS=YES. Review log for real time and run PROC CONTENTS on SAS table to view size. data work.sf_zip_2; set SNOW.ZIP_UNDEFINED (SCANSTRINGCOLUMNS=YES); run; /*log results - real time 4.36 seconds with bulkunload*/ PROC CONTENTS data=work.sf_zip_2; /* reveals there are 21 variables, 40938 rows. Uncompressed, UTF-8 encoding, 33 MB file size. Maximum Char length variable is 300 (ALIAS_CITYN_U) followed by ALIAS_CITY_U (284).*/ Resources Viya 4.0 SCANSTRINGCOLUMNS= LIBNAME Option - Available for Google BigQuery, Microsoft SQL Server, ODBC, Snowflake SAS9 SCANSTRINGCOLUMNS= LIBNAME Option - Available for Google BigQuery, Microsoft SQL Server, ODBC

pechin · ‎10-02-2024

The Challenge Joining data from different data sources (aka heterogeneous join) requires copying the data to SAS for processing which can take a lot of time depending on the size of the data. The most effective way to process such a join is to utilize database temp tables to perform the join in-database (See my post “How to use database temp tables to improve performance of heterogenous joins” ). This strategy, however, may not be available or permitted. When this is the case, what other options exist for reducing data movement between a database and SAS? The Solution One of the best practices when querying a database is to reduce the amount of data that is copied to SAS by limiting columns and rows. To reduce columns, do not “select *” when using PROC SQL or copy a whole database table in a DATA STEP. Instead select only the columns you need. Similarly, to reduce rows use the WHERE statement to subset the data to only what is required. Limiting rows and columns can help improve query performance speed but may not have much of an impact if your query results in copying very large database data (in the 100s of gigabytes or terabytes) to SAS. Another option is available if you are joining a table to a database table on a field with no more than 4500 unique values. Adding the option MULTI_DATASRC_OPT= IN_CLAUSE to a SAS/ACCESS® interface LIBNAME instructs the PROC SQL optimizer to generate an IN CLAUSE for joins. This prevents SAS from retrieving all the rows from a database table. Instead, it performs a row count operation on each table to determine the larger table*; identifies the unique values in the smaller table; and retrieves only the rows in the larger table that match those unique values. *To further improve performance by eliminating the row count, use data set option DBLARGETABE=YES. This option identifies which table is larger when processing a join. Note that this option is ignored when outer joins are processed. An Example I have two data sets to join in PROC SQL: A small SAS data set in WORK, customers_1K = 1000 rows, 1 column with up to 1000 unique IDs. A large database table in Snowflake, trans_100M = 100 million rows, 5 columns with up to 100,000 unique IDs. When I perform a left join on ID values (scenario 1), it runs in 156.3 seconds. All but 0.15 seconds being the time needed to unload the larger dataset from Snowflake into SAS (with the benefit of BULKUNLOAD available in the SAS/ACCESS® interface to Snowflake). When I rerun the Snowflake LIBNAME with option MULTI_DATASRC_OPT= IN_CLAUSE and rerun the same join (scenario 2), it runs in 10 seconds ( a 93.5% reduction!). The difference is that instead of copying the entire Snowflake table to SAS, only rows matching the unique values in the smaller table are extracted from the database. The less data required by the query, the less time it takes to move it between Snowflake and SAS. Finally (scenario 3), after identifying the larger table in my PROC SQL statement using data set option DBLARGETABLE=YES and repeating the join, the time is further reduced by 0.6 seconds. This is because this option eliminates the row count step. It does not count the 1000 or 100 million rows and that saves time. Scenario Options used Time move data from Snowflake to SAS Real Time 1 None 156.30 sec. 156.45 sec. 2 MULTI_DATASRC_OPT= IN_CLAUSE 10.02 sec. 10.03 sec. 3 MULTI_DATASRC_OPT=IN_CLAUSE, DBLARGETABLE=YES (on SF Table) 9.28 sec. 9.43 sec. Code to test on your environment Use the following code to create the data sets and replicate the scenarios described above. I used SAS/ACCESS® interface to Snowflake. If you use another SAS/ACCESS® interface, substitute your database information in the LIBNAME statements. Step 1: Add system options to enhance performance statistics in the log. OPTION SASTRACE=',,,ds' SASTRACELOC=SASLOG NOSTSUFFIX SQL_IP_TRACE=(note, source) msglevel=i FULLSTIMER; Step 2: Create a LIBNAME to your database without the option MULTI_DATASRC_OPT= IN_CLAUSE /* LIBNAME macro variables values not shown*/ libname SNOW snow server=&SFServer db=&SFDB user=&SFUser pw=&SFPW schema=&SFSchema bulkload=yes bulkunload=yes bl_internal_stage="user/test1"; Step 3: Create a 100 million row data set in Snowflake and a 1000 row data set in SAS WORK. data snow.trans_100M; format Date date9.; do i=1 to 100000000; ID=Rand('integer', 1, 100000); Date=Rand('integer', 22995, 23725); Var2=Rand('integer', 1, 10000); Var3=Rand('integer', 1, 5000); Var4=Rand('integer', 1, 40000); output; end; drop i; run; data work.customers_1K; do i=1 to 1000; ID=Rand('integer', 1, 10000); output; end; drop i; run; Step 4: Scenario 1 - Join smaller table in SAS WORK and larger table in Snowflake. proc sql; create table work.testa as select a.id, b.Date, b.Var2, b.Var3, b.Var4 from work.customers_1K a left join snow.trans_100M b on a.id = b.id; quit; run; Review the log to get baseline real time and note SAS/ACCESS engine time to unload data to SAS. SNOWFLAKE: Bulkload seconds used for setup: 105.898558 SNOWFLAKE: Bulkunload conversion (seconds): 145.717032 Summary Statistics for SNOWFLAKE are: Total row fetch seconds were: 0.000031 Total SQL execution seconds were: 105.896808 Total SQL prepare seconds were: 0.846087 Total SQL describe seconds were: 0.000060 Total seconds used by the SNOWFLAKE ACCESS engine were 156.304091 NOTE: Table WORK.TESTA created, with 999175 rows and 5 columns. .... NOTE: PROCEDURE SQL used (Total process time): real time 2:36.45 Step 5: Scenario 2 – Add LIBNAME option MULTI_DATASRC_OPT= IN_CLAUSE. and rerun join of scenario 1. libname snow clear; libname SNOW snow server=&SFServer db=&SFDB user=&SFUser pw=&SFPW schema=&SFSchema bulkload=yes bulkunload=yes bl_internal_stage="user/test1" multi_datasrc_opt=in_clause; proc sql; create table work.testb as select a.id, b.Date, b.Var2, b.Var3, b.Var4 from work.customers_1K a left join snow.trans_100M b on a.id = b.id; quit; run; Review log to get comparative real time. Note THE SELECT “ID”…..FROM …trans_100M WHERE ((“ID” IN (21, 28, …..). The values in the parentheses are the unique ID values from the smaller SAS data set. SNOWFLAKE_2: Prepared: on connection 0 SELECT "ID", "Date", "Var2", "Var3", "Var4" FROM "PECHIN"."trans_100M" WHERE ( ( "ID" IN ( 21 , 28 , 48 , 65 , 67 , 80 , 83 , 84 , 87 , 108 , 120 , 128 , 143 , 146 , 169 , 179 , 204 , 221 , 233 , 236 , 263 , 284 , 288 , 302 , 310 , 314 , 325 , 330 , 337 , .... Summary Statistics for SNOWFLAKE are: Total row fetch seconds were: 0.000030 Total SQL execution seconds were: 9.280490 Total SQL prepare seconds were: 0.394356 Total SQL describe seconds were: 0.000062 Total seconds used by the SNOWFLAKE ACCESS engine were 10.024368 NOTE: Table WORK.TESTB created, with 999175 rows and 5 columns. 89 quit; NOTE: PROCEDURE SQL used (Total process time): real time 10.03 seconds Step 6: Scenario 4 – Add data set option (DBLARGETABLE=YES) to PROC SQL query to identify larger table and avoid row count step. proc sql; create table work.testb as select a.id, b.Date, b.Var2, b.Var3, b.Var4 from work.customers_1K a left join snow.trans_100M (dblargetable=yes) b on a.id = b.id; quit; run; Review log to get comparative real time. Summary Statistics for SNOWFLAKE are: ….. Total seconds used by the SNOWFLAKE ACCESS engine were 9.28 NOTE: Table WORK.TESTB created, with 999175 rows and 5 columns. 89 quit; NOTE: PROCEDURE SQL used (Total process time): real time 9.43 seconds Resources Limiting Retrieval Passing the WHERE Clause to the DBMS MULTI_DATASRC_OPT= LIBNAME Statement Option DBLARGETABLE= Data Set Option Temporary Table Support for SAS/ACCESS

pechin · ‎09-27-2024

The Challenge Analysts often need to query a large database by joining it to a smaller dataset in SAS (or another database). For example, a data set of customer IDs is joined to a transactional database to generate summary statistics per customer. A query that joins data from different sources (aka heterogenous join) requires copying the data to SAS for processing. The larger the data, the longer it will take to copy the data to SAS. If the data you are joining in the database is multi-gigabytes or terabytes, you will experience slow performance times. Besides the impact of this challenge on productivity, there may also be a cost. Many cloud-based databases charge for extracting data. If this is the case, minimizing data movement between the database and SAS will save money. The Solution The most efficient alternative is to have the data and the processing in the same location (aka homogeneous join). Most SAS/ACCESS® interface engines enable the creation of temporary tables in the database where you can copy data into the database temporarily and process the join in-database (see resource section). Although it will still take time to copy a smaller data set into the database, the performance gain by processing the join in-database should be much faster than the heterogenous join. An Example I have two data sets to join in PROC SQL: DATA_100K = 100,000 rows, 4 numeric columns = 3 MB DATA_1B = 1 billion rows, 4 numeric columns = 7.1 GB When I perform a heterogenous join (scenario 1) with Data_100K in SAS WORK and Data_1G in Snowflake, it completes in 1 hour 22 minutes. All but 20 seconds of that time was to copy Data_1G from Snowflake to SAS. Fortunately, the SAS/ACCESS® interface to Snowflake enables bulk loading and unloading. Otherwise, the copy time would take longer. In scenario 2, I copy Data_100K from SAS WORK into a Snowflake temp table, perform the join on Snowflake, and send the output to SAS WORK. This completes in only 16.7 seconds. A 99% reduction! That 16.7 seconds includes: 4.8 sec to copy Data_100K into a Snowflake temp table; 6.2 seconds to process on Snowflake; and 5.7 seconds to send the 5000 row output file to SAS Work. In scenario 3, I repeat scenario 2 but redirect the output to Snowflake. This drops the total time to 9 seconds. While SAS/ACCESS® enables the creation of temp tables, your ability to leverage this capability will depend on your database permissions and your organization’s policy. Check with your database administrator to determine whether temp tables are an option for you. Code to test on your environment Use the following code to create the data sets and replicate the scenarios described above. I used SAS/ACCESS® interface to Snowflake. If you use another SAS/ACCESS® interface, substitute your database information in the LIBNAME statements. Step 1: Add system options to enhance performance statistics in the log. OPTION SASTRACE=',,,ds' SASTRACELOC=SASLOG NOSTSUFFIX SQL_IP_TRACE=(note, source) msglevel=i FULLSTIMER; Step 2: Create two LIBNAMES to your database. To join a temporary table and a permanent table, you need a libref for each table and these librefs must successfully share a global connection. Note in code below that all LIBNAME options are the same except SNOWTEMP contains dbmstemp=yes. /* LIBNAME macro variables values not shown*/ LIBNAME SNOWPERM snow server=&SFServer db=&SFDB user=&SFUser pw=&SFPW schema=&SFSchema bulkload=yes bulkunload=yes bl_internal_stage="user/test1" connection=GLOBAL dbcommit=0; LIBNAME SNOWTEMP snow server=&SFServer db=&SFDB user=&SFUser pw=&SFPW schema=&SFSchema bulkload=yes bulkunload=yes bl_internal_stage="user/test1" connection=GLOBAL dbcommit=0 dbmstemp=yes; Step 3: Create a billion row data set in SNOWPERM and 100K row data set in SAS WORK. data SNOWPERM.DATA_1B; do i=1 to 1000000000; ID=Rand('integer', 1, 5000); Var2=Rand('integer', 1, 10000); Var3=Rand('integer', 1, 5000); Var4=Rand('integer', 1, 40000); output; end; drop i; run; data work.DATA_100K; do i=1 to 100000; ID=Rand('integer', 1, 5000); Var2=Rand('integer', 1, 10000); Var3=Rand('integer', 1, 5000); Var4=Rand('integer', 1, 40000); output; end; drop i; run; Step 4: Scenario 1 - Join smaller table in SAS WORK and larger table in SNOWPERM (aka heterogenous join) PROC SQL; CREATE TABLE WORK.SCENARIO_1 AS SELECT t1.ID, (AVG(t1.Var2)) LENGTH=8 AS AVG_Var2, (MIN(t2.Var3)) LENGTH=8 AS MIN_Var3, (MAX(t2.Var4)) LENGTH=8 AS MAX_Var4 FROM WORK.DATA_100K t1 INNER JOIN SNOWPERM.DATA_1B t2 ON (t1.ID=t2.ID) GROUP BY t1.ID; QUIT; Review the log to get baseline real time and note SAS/ACCESS engine time to unload data to SAS. Table WORK.SCENARIO_1 created, with 5000 rows and 4 columns. SNOWFLAKE: Bulkload seconds used for setup: 536.267720 SNOWFLAKE: Bulkunload conversion (seconds): 4911.57829 real time 1:22:11.58 Step 5: Scenario 2 - Copy smaller table in SAS WORK to SNOWTEMP, join data in Snowflake and send output to SAS WORK. DATA SNOWTEMP.DATA_100K; SET WORK.DATA_100K; PROC SQL; CREATE TABLE WORK.SCENARIO_2 AS SELECT t1.ID, (AVG(t1.Var2)) LENGTH=8 AS AVG_Var2, (MIN(t2.Var3)) LENGTH=8 AS MIN_Var3, (MAX(t2.Var4)) LENGTH=8 AS MAX_Var4 FROM SNOWTEMP.DATA_100K t1 INNER JOIN SNOWPERM.DATA_1B t2 ON (t1.ID=t2.ID) GROUP BY t1.ID; QUIT; Review log to get comparative real time. /*Copy file from SAS WORK to SNOWTEMP*/ DATA SNOWTEMP.DATA_100K; SET WORK.DATA_100K; ... real time 4.8 seconds NOTE: Table WORK.SCENARIO_2 created, with 5000 rows and 4 columns. ... real time 11.9 seconds Step 6: Scenario 3 - Redirect query output from Scenario 2 to SNOWTEMP and run. PROC SQL; CREATE TABLE SNOWTEMP.SCENARIO_3 AS SELECT t1.ID, (AVG(t1.Var2)) LENGTH=8 AS AVG_Var2, (MIN(t2.Var3)) LENGTH=8 AS MIN_Var3, (MAX(t2.Var4)) LENGTH=8 AS MAX_Var4 FROM SNOWTEMP.DATA_100K t1 INNER JOIN SNOWPERM.DATA_1B t2 ON (t1.ID=t2.ID) GROUP BY t1.ID; QUIT; Review log to get comparative real time. ... real time 4.1 seconds Resources SASTRACE System Option FULLSTIMER SAS Option - SAS Support Communities Temporary Table Support for SAS/ACCESS DBMSTEMP= LIBNAME Statement Option

Online Status	Offline
Date Last Visited	3 weeks ago

Leveraging Database Indexes to Improve Performance

NULL Handling Differences Between SAS and Relational Databases

Preventing Data Explosion When Reading Snowflake Data into SAS Viya 4....

How to use LIBNAME option MULTI_DATASRC_OPT= to reduce data movement b...

How to use database temp tables to improve performance of heterogeneou...

2024 Customer Awards: MedImpact Healthcare Systems Inc.("Medimpact") -...

1st Place Winner - 2023 Customer Awards: Blue Shield of California - V...

2023 Customer Awards: Georgia Pacific - Curious Thinker

2023 Customer Awards: Parexel - Curious Thinker

1st Place Winner - 2023 Customer Awards: Gilead Sciences - Community U...

Leveraging Database Indexes to Improve Performance

NULL Handling Differences Between SAS and Relational Databases

Preventing Data Explosion When Reading Snowflake Data into SAS Viya 4....

How to use LIBNAME option MULTI_DATASRC_OPT= to reduce data movement b...

How to use database temp tables to improve performance of heterogeneou...

Leveraging Database Indexes to Improve Performance

NULL Handling Differences Between SAS and Relational Databases

Preventing Data Explosion When Reading Snowflake Data into SAS Viya 4....

How to use LIBNAME option MULTI_DATASRC_OPT= to reduce data movement b...

How to use database temp tables to improve performance of heterogeneou...