SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Is there a way to export a DI Studio job without all the cruft?

Reply
Super Contributor
Posts: 392

Is there a way to export a DI Studio job without all the cruft?

tl;dr:  Is there a way to export a DIS job without all the useless cruft?

 

My colleague supports DI Studio.  I have used it in the past, but thankfully don't have to in my current role.

 

We have a DIS job which, when exported, is 95,000+ lines long.  I estimate 90-95% of it is useless cruft:  performance macros whose same definition is repeated again and again and again; thousands of lines of macro variables that are never referenced; header blocks of useless information; etc, etc.

 

Are there any options in DIS that will allow my colleague to export only the relevant code that actually "does stuff"?  IOW, I have the 95,000 line existing program, and the 3,000 line new and improved program, and they both result in the same output.

 

I have to take the actually useful bits of this code and import it into a new program.  Trolling through the 95K line program is onerous at best.

 

Thanks...

Super User
Posts: 5,621

Re: Is there a way to export a DI Studio job without all the cruft?

Posted in reply to ScottBass

I can understand your frustration. 

But DI Studio generated code is nt really meant to be used out of the DI Studio context.

From a developer perspective, you dig into the code "only" when you trouble shoot, optimize and audit the job.

Maintenance of the code is don only from the GUI.

 

That said, the generation of the performance macros can be switched off, by a setting in the UI options.

Data never sleeps
Super Contributor
Posts: 392

Re: Is there a way to export a DI Studio job without all the cruft?

Thanks @LinusH.  I guess I can't do what I want, and will need to ask my colleague to cut and paste the numerous transformations (mostly user written) into a file.

 

But DI Studio generated code is nt really meant to be used out of the DI Studio context.

 

Why do you say that?  I've never heard this perspective from SAS?

 

As for my perspective, DIS is a code generator (plus impact analysis and a few other bits), analogous to other IDE's such as Visual Studio, Eclipse, etc, etc.  If DIS didn't generate SAS code, its purpose would be meaningless.  And of course, that SAS code must be exported in order to schedule a job.  So if I use Jams, Control-M, or LSF to schedule that job, I'm certainly using it outside the DIS context.

 

Riddle:  How do you take a well-written, efficient, highly performant, well-commented 1000 line SAS program and turn it into a poorly-written, probably inefficient, 15,000 line garbled mess?  Answer:  Rewrite it in DIS ;-)

 

Super User
Posts: 5,621

Re: Is there a way to export a DI Studio job without all the cruft?

Posted in reply to ScottBass

SAS code is the executor of your instructions.

But the instructions should be updated and maintained via metadata. If not, lineage, maintainability gets lost-

DI Studio is not a very efficient code generator - most developers create programs faster without it. But for life-cycle management for larger DW applications, I would say it's invaluable.

So by extracting the code, change it outside DIS, any connections with metadata is lost.

 

I don't think SAS ever should communicate that you should edit DI Studio jobs outide of DI Studio.

Data never sleeps
Respected Advisor
Posts: 4,284

Re: Is there a way to export a DI Studio job without all the cruft?

[ Edited ]
Posted in reply to ScottBass

@ScottBass

Being a long term SAS user and coder I've had same as you my struggles with DIS, especially the v3.x versions. DIS is certainly not perfect and I've got a whole wish-list for enhancements.

 

Even though you schedule the generated code DIS is always the tool for any job maintenance and changing the code outside of DIS breaks the concept. It's certainly nothing SAS has ever recommended.

 

Like all tools DIS has its use cases. I'm in my professional life encountering more and more often projects with end-to-end data lineage requirements - which are often driven by regulatory requirements. Using DIS (or any other metadata driven ETL tool) makes it much much easier to meet such requirements.

 

As for the "garbage code": Using DIS you most of the time won't examine the code as a whole. As @LinusH writes there are some options to reduce generated bits. Whenever I'm at a new client site first thing I do is deactivate the option which is set to generate these performance macros by default when creating a new job. 

 

And last but not least: Most of the "garbage code" adds some checks, generates many lines of code but won't take up a lot of processing time (excluded the select count(*) for row counts which some transformations generate and which can be hard to turn off).

Ask a Question
Discussion stats
  • 4 replies
  • 197 views
  • 3 likes
  • 3 in conversation