SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

DI Studio: generated code vs user-written code

Reply
PROC Star
Posts: 1,233

DI Studio: generated code vs user-written code

Hi,

Wondering what people think about the "Best Practice" for DI Studio jobs, regarding whether or not they should contain (or consist entirely of) user-written code.

On several threads relating to DI Studio, I have seen questions where someone asked about how to do something with a DI transformation (i.e. code generator).  If a response came back "you can do it easily with user-written code..." There was typically a response after that, recommending that people keep user-written code out of their DIS jobs.

Would people comment on what they see as the down-side of using user written code in DI jobs?

My background is as a SAS programmer.  I have only recently started using DI Studio.  And mostly my DI Jobs have been all user-written code.  In fact often just a single node which says %include xxx.sas.   So I use DI studio/SMC mostly as a tool to schedule ETL jobs.  The fact that they can send email notifications etc is also nice, but of course BASE SAS can do that too.

I guess it's really two questions:

1.  Is it reasonable to have a DI job which is just a user-written SAS program?  I would argue yes.  In the same way as I think it's reasonable to have an Enterprise Guide project which is all old fashioned user-written SAS code.  If you have SAS programmers, let them program in SAS.  In my head, "SAS programmer" is a different role / skillset than "DI Studio Developer."

2.  For those of you who are DI Studio Developers, is it reasonable to have a DI job that uses DI transformations, and intersperse some user written modules within the job?  Here I don't know the answer, since I have not really experienced the benefit of building a DI job from transformations.  I have seen people say that if you add user-generated code to a DIS job built from transformations, other DI functionality will "break."  Assume that means things like the magical ability to propagate changes from one end of a flow to another? Are there other down sides (other than the need to learn SAS code)??

Thanks,

--Q.

Regular Contributor
Posts: 180

Re: DI Studio: generated code vs user-written code

I Think that if you are a SAS Programmer and already have a big number of SAS Programs already running and a good experience in using the DM then the way you use DI Studio makes sense but using DI Studio to develop new programs bring the at least the following benefits:

  • You get the proccess well documented in diagrams.
  • You can use the Impact and Reverse impact Analysis
  • DI is intuitive (very easy) and drag-and-drop for developping ETL processes (in fact its former name was ETL Studio)
  • You can use Change Management techniques within the tool.

In summary I think a good balance is using User Written Code transformations ONLY in cases where there is not a transformation for the functionality you need to develop and connect them to the rest of transformation via Tables/Views.

Super User
Posts: 5,257

Re: DI Studio: generated code vs user-written code

1) If the SAS programmer isn't a DI developer, why should he/she even work within DIS? I think that the comparison with EG isn't really working. EG is designed for ad-hoc analysis, and it's concept is very free, each user can have it's own sand box. DIS on the other hand is designed for building and maintain production metadata, both jobs and data structures. So you need to look at the requirements for your system. If you are not to use any metadata, why have you invested in a quite a pricey software?

2) You can combine standard t4ransformations a user written code in a job, and sometimes you really need to to do that, becuase of some logic can't be built with standard transformations. Then it's crucial that the code is written so dynamically as possible, so that it can withstand changes in the job, such as altering inputs/outputs. There are a set of macro variables that DI Studio creates for this purpose - use them!

Data never sleeps
PROC Star
Posts: 1,233

Re: DI Studio: generated code vs user-written code

Thanks LinusH, that's a helpful perspective.

1.  I had been thinking of DIS as an ETL tool.  I hadn't been thinking of it as a tool for maintaining metadata.  As an old school SAS programmer, I haven't done much using metadata for data sources, typically writing my own LIBNAME statements etc.  I'm not in an environment where there is extensive shared metadata maintained by a data mangagement group, which is then used by many others.  But I can accept that that may be a large part of a purpose for DIS (and the whole BI framework in general; I recognize SAS is becoming more and more metadata centric).  Why is the SAS programmer using DIS?  In my case, because it's the easiest way to run a production ETL job (even thought it's a job which is *not* reliant ot metadata).   I could use a cron job on linux server for the same thing, but at least with DIS there is still metadata to store that the job exists, the job has run, etc.

2.  Glad to hear people who primarily use  transformations have "safe" routes to write their own dynamic code without losing functionality of DIS.  Of course as a SAS programmer, dynamic code has always been key.  And of course metadata-driven code has been popular for a long time before there was the recent BI Metadata construct.  Macros, CALL EXECUTE, and an understanding of data structure can go a long way.

--Q.

Super User
Posts: 5,257

Re: DI Studio: generated code vs user-written code

1) If you are familiar with MS SSIS, ETL and Metadata are two separate worlds. In SAS, the ETL tool and metadata is integrated. It adds some complexity to the tool, but in my mind, it ads a lot of functionality and benefits for the maintenance of the system. It's like a "define an object only once, in one place only"-concept. By using separate tools, you can end upp having to do a lot of double/triple work - or having to do integration between different tools.

Br

Linus

Data never sleeps
Trusted Advisor
Posts: 1,249

Re: DI Studio: generated code vs user-written code

Hi Quentin,

As I discuss in my blog post,     What's the difference between SAS Enterprise Guide and SAS DI Studio? A difference between EG and DI Studio is metadata...

Using DI Studio allows people who aren't skilled in SAS programming but have data warehouse knowledge to pick up data warehousing with SAS very quickly using point and click and the jobs can be maintained by a wide audience into the future. I have seen situations where people have come from an Oracle background and simply been able to use data warehousing principles with SAS very easily due to the built-in transformations. (See John Heaton's SASGF13 best contributed paper - SAS® Data Integration Studio: The 30 Day Plan )

This is similar to the way that Enterprise Guide point and click tasks help people who are not yet skilled in SAS programming to do querying and reporting.

In DI Studio, being metadata driven, Impact Analysis becomes straight forward and doesn't require someone to study the SAS code.

I see many benefits to the DW team and business in using the built-in transformations rather than using user-written transformations. This is not to say that there may be situations where user-written transformations are needed but I think limiting it's usage is a better practice.

Kind Regards,

Michelle

Ask a Question
Discussion stats
  • 5 replies
  • 1062 views
  • 1 like
  • 4 in conversation