About js5

js5

I did engage both SAS and Databricks tech support, but I am exploring alternatives in parallel.

js5

The job copies data from SQL server bound as catalog to another schema. We are testing direct upload via ODBC in parallel, but unfortunately the performance is insufficient.

js5

Hello, I am trying to set up a program calling rest API in order to trigger a databricks job. The issue I am facing is that I cannot encrypt the token and it only works if the oauth_bearer value is provided in plaintext. This gives me error 401: filename pwfile 'testpw.txt'; proc pwencode in="foo" out=pwfile method=sas005; run; data _null_; infile pwfile truncover; input line :$200.; call symputx('dbxtoken',line); run; filename jpl 'testin.json'; proc json out=jpl; write values "job_id" "1234"; run; filename jrp 'testout.json'; proc http url="https://dbc-abcd-1234.cloud.databricks.com/api/2.2/jobs/run-now" method=post in=jpl out=jrp oauth_bearer="&dbxtoken"; run; Whereas this works as intended: filename jpl 'testin.json'; proc json out=jpl; write values "job_id" "1234"; run; filename jrp 'testout.json'; proc http url="https://dbc-abcd-1234.cloud.databricks.com/api/2.2/jobs/run-now" method=post in=jpl out=jrp oauth_bearer="&foo"; run; Is this expected? And if so, what would be the alternative for not having to store the service principal secret in plaintext? Thanks!

js5 · ‎09-12-2025

We could probably get this set up, but it would require considerably more adjustments than just changing which ODBC driver is being used by our ODBC libname. Right now, we have the SQL server set up as an external data catalogue in DBX, and we can just copy the data to corresponding databricks schemas using CTAS statements. The goal of using ODBC to load to DBX directly would be to potentially eliminate the need for the SQL Server altogether. Having to go the S3 route would need to be analysed in terms of how much resources it would cost to set up.

js5 · ‎09-11-2025

I have now.

js5 · ‎09-10-2025

With lower insertbuff values the upload is even slower as DBX ends up creating one parquet file per row. I am following this up with our DBX contacts in parallel. Bulkload with ODBC is unfortunately only supported for SQL Server. We would need SAS/Access to Spark in order to be able to bulkload the data to DBX.

js5 · ‎09-08-2025

The architecture is somewhat complex due to historical reasons: the library I am copying from is locaten on an on-prem network share SAS runs on AWS EC2 SQL Server runs on an on-prem VM Databricks runs on AWS Creating the table directly in databricks (CTAS from the SQL Server table connected as catalog) takes two minutes. So still much longer than creating a SQL Server table, but nevertheless two to three times faster than creating it via ODBC.

js5 · ‎09-08-2025

Download performance is not an issue. If it does not fail due to insufficient memory (I need to investigate why this happens), the entire table is downloaded in around 10 seconds.

js5 · ‎09-08-2025

There is no meaningful difference when using proc datasets. This is likely explained by the fact that it still uses SQL in the background according to the ODBC trace.

js5 · ‎09-05-2025

Hello, we have been experimenting with uploading results generated with SAS to a Databricks SQL warehouse. The code looks as follows: libname dbx odbc prompt="Driver={Simba Spark ODBC Driver}; Host=foo.cloud.databricks.com; Port=443; HTTPPath=/sql/1.0/warehouses/bar; SSL=1; ThriftTransport=2; AuthMech=3; UID=token; PWD=baz; Catalog=foofoo; Schema=barbar; DefaultStringColumnLength=32767" dbcommit=10000 insertbuff=10000 readbuff=1000 dbcreate_table_opts="TBLPROPERTIES('delta.columnMapping.mode' = 'name', 'delta.checkpoint.writeStatsAsStruct' = 'false', 'delta.autoOptimize.optimizeWrite' = 'true')" preserve_col_names=yes; proc sql; create table dbx.test as select * from test; run; The performance leaves a lot to be desired unfortunately. As an example, a dataset with 122 variables and 127 thousand observations takes between five and six minutes to upload, whereas it takes less than 20 seconds to upload to a Microsoft SQL Server DB (also via ODBC). Is this expected? According to the documentation, Databricks can also be accessed via JDBC and Spark SAS ACCESS modules, but we unfortunately do not have them licensed. Are there any options which could improve ODBC upload performance? I tried increasing insertbuff and dbcommit further, but I get the following error when going above 12k: ERROR: CLI execute error: [Simba][Hardy] (130) An error occurred while an INSERT statement which causes the driver to reconnect to the server. Thanks for your help in advance!

js5 · ‎12-13-2024

Hello, we have an .egp file going way back which has now grown to over 15 MB in size. We had internal versioning and embedded programs enabled at some point, but nowadays we store the code in git, SAS programs are external and the .egp file essentialy only contains the execution order. I opened the file with 7zip and saw lots of folders with random text strings as names, some of them containing code.sas in the deepest subfolder. Is there a way to force a cleanup of this old cruft? Besides recreating the execution order in a fresh .egp file? Thanks!

js5 · ‎10-21-2024

Does https://communities.sas.com/t5/SAS-Procedures/Outputting-by-sum-with-proc-freq/m-p/948331#M83716 help?

js5 · ‎10-21-2024

This lacks the frequencies per a*b combination unfortunately

js5 · ‎10-21-2024

I do need a dataset as output. The two datasets do not have the same number of observations unfortunately. Here is a simplified example using sashelp.cars: proc sort data=sashelp.cars out=cars_sorted; by Make Drivetrain Type; run; proc freq data=cars_sorted noprint; by Make Drivetrain Type; tables Model/out=cars_freq; weight MSRP/zeros; run; proc means data=cars_sorted completetypes noprint nway; by Make Drivetrain Type; var MSRP; output out=cars_means(drop=_:) sum=MSRP_sum_carstate; run; data final; merge cars_freq cars_means; by Make Drivetrain Type; run; I am in the end interested in frequencies per by group (COUNT) as well as the denominator used to calculate them (MSRP_sum_carstate). Sorting before turned out not to be that resource-intensive after all, but using one step instead of two would still be cleaner.

js5 · ‎10-18-2024

Hello, I have the following code: proc freq data=inputds noprint; by notsorted a b c; tables x*y*z/out=freq; weight count/zeros; run; proc means data=inputds completetypes noprint nway; by notsorted a b c; var count; output out=means(drop=_:) sum=count_sum_c; run; data final; merge freq means; by notsorted a b c; run; Is it possible to produce the dataset in one single step? having a notsorted means that the merge is not possible, and due to the fact that the input and freq datasets contain over 90 million rows, I would prefer not to have to sort anything unless absolutely unavoidable. Thank you for your tips in advance.

Online Status	Offline
Date Last Visited	a week ago

Re: oauth_bearer does not seem to support encoded tokens

Re: oauth_bearer does not seem to support encoded tokens

oauth_bearer does not seem to support encoded tokens

Re: Upload to Databricks via ODBC is very slow

Re: Upload to Databricks via ODBC is very slow

Re: Upload to Databricks via ODBC is very slow

Re: Upload to Databricks via ODBC is very slow

Re: Upload to Databricks via ODBC is very slow

Re: Upload to Databricks via ODBC is very slow

Upload to Databricks via ODBC is very slow

Re: SAS EG not supporting UTF-8 in code?

Re: timing issues with filename function and infile statement?

Re: computed variables under across variables

Re: computed variables under across variables

Re: ods excel options(flow="tables") and custom sorting

Re: Upload to Databricks via ODBC is very slow

Automation: getting feedback on whether a project or a process flow ra...

Re: Excluding elements from proc lifetest output leaves a dangling hea...

Re: Proc lifetest and ods graphics supressing the legend

Bulkload introducing rounding errors to datetime variables

Re: oauth_bearer does not seem to support encoded tokens

Re: oauth_bearer does not seem to support encoded tokens

oauth_bearer does not seem to support encoded tokens

Re: Upload to Databricks via ODBC is very slow

Re: Upload to Databricks via ODBC is very slow

Re: Upload to Databricks via ODBC is very slow

Re: Upload to Databricks via ODBC is very slow

Re: Upload to Databricks via ODBC is very slow

Re: Upload to Databricks via ODBC is very slow

Upload to Databricks via ODBC is very slow

Garbage collection in .egp file

Re: Outputting by sum with proc freq

Re: Outputting by sum with proc freq

Re: Outputting by sum with proc freq

Outputting by sum with proc freq