Optimal Pass Through SQL with SAS Data Integration Studio

GertNissen · Posted 08-16-2019 08:23 AM

Environment:

Windows: X64_SR12R2 WIN 6.3.9600 Server
SAS 9.04.01M4P110916
Microsoft SQL Server Management Studio 14.0.17254.0
SAS/ACCESS: ODBC other DBMS
SAS Data Integration Studio: 4.902

Background:

2 SAS libname pointing to the same MS SQL Server, but 2 different databases

Problem: ineffective SQL performance

Story:

SAS Data Integration Studio's SQL Join Transformation does not make perfect Queries because libnames are not 100% identical (same MS SQL server, two Databases)

Here are 3 scenarios

Implicit join - one libname (default in DIS)
Explicit join - one libanme (in Join Properties/Pass Through=Yes)
Join - two libnames - only implicit join (explicit not possible because of two libname)

DIS 3 SQL.png

Regarding scenario 3 - If I don't want to to use "User Written Code" in order to write a SQL Execute Query to enforce Explicit execution like below, that would otherwise work perfectly (would like to maintain auto-generated DIS code if possible)

* one SQL server, two SQL Database, one SAS connection*;
proc sql;
connect using gnis_DDL;
execute (
	select [Booking].[ACCOUNT_TYPE]
		 , T_ACCOUNT_TYPE.DESCRIPTION
		 , count(*) as count
	  from [DDL].[dbo].[Booking]
	  left join [DW_HIST].dbo.T_ACCOUNT_TYPE
	    on [Booking].ACCOUNT_TYPE = T_ACCOUNT_TYPE.id
	 group by [Booking].[ACCOUNT_TYPE],  T_ACCOUNT_TYPE.DESCRIPTION
) by gnis_DDL;
DISCONNECT FROM gnis_DDL;
quit;

What are my options?

Is there a secret option in DIS I haven't found?
Do I need to re-arrange the MS SQL server to have all databases merged into one database with several schemas?
Lose all metadata trace and apply "User Written Code" SQL Execute?
Would it be better to change from SAS/ACCESS ODBC to SAS/ACCESS for MS SQL?
or something else....?

Attached a sample log for extra information.

Patrick · Posted 08-16-2019 06:56 PM

@GertNissen

"Do I need to re-arrange the MS SQL server to have all databases merged into one database with several schemas?"

If you've got 1 sever instance with multiple databases and you want to join tables from multiple databases then you could also create a synonym in database A pointing to the table in database B. In doing so you then could use a single connection (single libname) for your joins which should allow SAS to push processing to the DB.

GertNissen · Posted 08-19-2019 05:10 AM

Hi @Patrick

We did not think of this beforehand - great thinking out-of-the-box - Thanks for your suggestion,

With 1500+ tables (more coming) I imagine it would require a maintenance script to make sure all are put into synonyms for all databases object - Would be easier if it was possible to put synonyms on the entire MS SQL Database. I think it might would be a little bit too much maintenance for us, but I can definitely see the benefits and use in other use-cases.

What would the syntax for the SAS Libname be for referencing a synonym in an MS SQL database?

Can't seem to find any reference about synonyms in the SAS documentation (only for Oracle)

https://documentation.sas.com/?docsetId=acreldb&docsetTarget=n0gz66qe8msnkyn1jgnhrr0l2k8y.htm&docset...

Patrick · Posted 08-19-2019 07:57 PM

@GertNissen

"What would the syntax for the SAS Libname be for referencing a synonym in an MS SQL database?"

From my understanding synonyms are just another object in a schema so there is nothing specific required on the SAS side to access synonyms. It's just a normal libname using SAS/Access to SQL Server (or SAS/Access to ODBC).

You then can use the synonyms as you would the actual table or view it points to.

https://docs.microsoft.com/en-us/sql/relational-databases/synonyms/synonyms-database-engine?view=sql...

If you want to go down this path then I also believe that it wouldn't take that much to create a maintenance script which keeps synonyms in sync with the data sources. Unlike for views you don't even have to define the columns but it's just on object level. A synonym in my simple world is - sort of - nothing else than a symbolic link to the actual object.

ErikLund_Jensen · Posted 08-17-2019 04:20 PM

@GertNissen

Hi Gert

Unless you follow @Patrick 's suggestion and merge all relevant tables into one schema using aliases (or views), I think you are stuck with option 3. But I don't see what you mean with "Lose all metadata trace".

It has become a sort of dogma that user-written code should be avoided at all cost, but I think it is a wrong notion that has it's origin in the early days of DI Studio. It was often necessary to migrate existing programs as one big block of user-written code to get things up and running, and then just leave it that way. And the whole concept was a blow to a programmer's self-esteem. I remember very well the feeling of a big chef suddenly reduced to heat ready meals in a microwave oven, and it was very tempting to code everything as usual and put it in a user-written transformation.

Time has taught us to appreciate metadata and has shown us the problems that arise from code with hardcoded library assignments and table names, because they don't have a metadata trace. But the dogma often results in extremely complicated jobs with dozens of extracts, joins, appends and incomprehensible complex expressions that makes it impossible to figure out what is going on in the job, and all that only to avoid a user-written transformation that would be much easier to maintain. And some tasks are impossible in standard transformations, which makes user-written code (or user-defined transformations) unavoidable.

Of course data should be prepared and further processed in standard transformations, and user-written code used only for single tasks that cannot be solved by standard transformations. But in that case, and if care is taken to ensure that the metadata trace of columns works, there is no real difference between a standard transformation and a user-written transformation.

Then all the surrounding auto-generated code is the same, and the only difference between e.g. a standard join and a user-written join is that in the standard join only expressions and where-clauses are stored as text strings in metadata textstore objects, while the whole "working" code is stored as one textstore object in a user-written transformation.

To ensure that the metadata trace of column use works, and to make the transaction as robust as a standard transformation, there are some rules that must be followed, so here comes what I consider good practice for user-written transformations:

the code should be simple, ideally one step only,
all permanent tables must be registered in metadata.
all permanent tables must be connected to the transformation in the DI Studio job,
all output tables (incl. work tables) must have their metadata registration updated after a test run,
column mappings must be added to the transformation, if necessary by use of some dummy expression to allow mapping of more than one source column.
all libnames must be assigned in the automatic generated part of the transformation code, never in the user-written part.
all table references in the code must be macro variables defined in the automatic generated part of the transformation code (_INPUTn_connect, _INPUTn_, _OUTPUTn_ etc.)
It is nice to add a yellow sticker to notify that output table definitions and mappings are "dummy" and must be manually updated together with code changes.

The code must be manually written, but apart from that the development of a user-written transformation following these rules is not more complicated than the development of any other transformation. A great help is to do things in the right order, especially:

register (define) any permanent output tables as empty tables with no columns.
connect all tables to the transformation.
add the code.
run the transformation.
update metadata for the output tables (necessary for the following steps)
examine output and repeat step 3-5 until everything is as expected.
add mappings to the transformation.

This was written with standard transformation "User Written Code" in mind, but please note that it also applies to coding and using a user defined transformation, which has the extra benefit of user-defined transformation options that can be used in the code.

GertNissen · Posted 08-19-2019 05:47 AM

Hi @ErikLund_Jensen

I was hoping that like DIS already can construct good pass-through if using identical libnames, there was a possibility to make DIS generate good pass-through using two different libnames (on the same MS SQL Server). Perhaps by using SAS/ACCESS for MS SQL instead of SAS/ACCESS for ODBC - or some other workaround.

My point about avoiding UWC/UWT was only because that it would be my own obvious/easy/first-choice go-to solution - I was looking for answers beyond the obvious - hence my comment about not using UWC/UWT (the Pass-Through example code I provided).

We already have 1500 tables and many DIS jobs with many joins - I did not want to go through all of them adding new UWC/UWT to solve the issue, I was hoping for another fix I was not aware of already.

It has not in any way been the purpose with my question to be advocating for not using UWC/UWT. I totally agree that well written/constructed UWC/UWT can be a good solution - But that was not what I was looking for (was hoping to learn something new). However, I would not have been satisfied with a UWC/UWT solution, IF there was a better solution hiding somewhere.

After reading my post again I think if I had just not have used the wording "Lose all metadata trace" in the final paragraph, I might not have been misunderstood and the attention of the reader would not be distracted from my actual question/challenge.

Optimal Pass Through SQL with SAS Data Integration Studio

Re: Optimal Pass Through SQL with SAS Data Integration Studio

Re: Optimal Pass Through SQL with SAS Data Integration Studio

Re: Optimal Pass Through SQL with SAS Data Integration Studio

Re: Optimal Pass Through SQL with SAS Data Integration Studio

Re: Optimal Pass Through SQL with SAS Data Integration Studio

Optimal Pass Through SQL with SAS Data Integration Studio

Re: Optimal Pass Through SQL with SAS Data Integration Studio

Re: Optimal Pass Through SQL with SAS Data Integration Studio

Re: Optimal Pass Through SQL with SAS Data Integration Studio

Re: Optimal Pass Through SQL with SAS Data Integration Studio

Re: Optimal Pass Through SQL with SAS Data Integration Studio

Registration is open