BookmarkSubscribeRSS Feed
ScottBass
Rhodochrosite | Level 12

tl;dr:  Is there a way to export a DIS job without all the useless cruft?

 

My colleague supports DI Studio.  I have used it in the past, but thankfully don't have to in my current role.

 

We have a DIS job which, when exported, is 95,000+ lines long.  I estimate 90-95% of it is useless cruft:  performance macros whose same definition is repeated again and again and again; thousands of lines of macro variables that are never referenced; header blocks of useless information; etc, etc.

 

Are there any options in DIS that will allow my colleague to export only the relevant code that actually "does stuff"?  IOW, I have the 95,000 line existing program, and the 3,000 line new and improved program, and they both result in the same output.

 

I have to take the actually useful bits of this code and import it into a new program.  Trolling through the 95K line program is onerous at best.

 

Thanks...


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
6 REPLIES 6
LinusH
Tourmaline | Level 20

I can understand your frustration. 

But DI Studio generated code is nt really meant to be used out of the DI Studio context.

From a developer perspective, you dig into the code "only" when you trouble shoot, optimize and audit the job.

Maintenance of the code is don only from the GUI.

 

That said, the generation of the performance macros can be switched off, by a setting in the UI options.

Data never sleeps
ScottBass
Rhodochrosite | Level 12

Thanks @LinusH.  I guess I can't do what I want, and will need to ask my colleague to cut and paste the numerous transformations (mostly user written) into a file.

 

But DI Studio generated code is nt really meant to be used out of the DI Studio context.

 

Why do you say that?  I've never heard this perspective from SAS?

 

As for my perspective, DIS is a code generator (plus impact analysis and a few other bits), analogous to other IDE's such as Visual Studio, Eclipse, etc, etc.  If DIS didn't generate SAS code, its purpose would be meaningless.  And of course, that SAS code must be exported in order to schedule a job.  So if I use Jams, Control-M, or LSF to schedule that job, I'm certainly using it outside the DIS context.

 

Riddle:  How do you take a well-written, efficient, highly performant, well-commented 1000 line SAS program and turn it into a poorly-written, probably inefficient, 15,000 line garbled mess?  Answer:  Rewrite it in DIS 😉

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
LinusH
Tourmaline | Level 20

SAS code is the executor of your instructions.

But the instructions should be updated and maintained via metadata. If not, lineage, maintainability gets lost-

DI Studio is not a very efficient code generator - most developers create programs faster without it. But for life-cycle management for larger DW applications, I would say it's invaluable.

So by extracting the code, change it outside DIS, any connections with metadata is lost.

 

I don't think SAS ever should communicate that you should edit DI Studio jobs outide of DI Studio.

Data never sleeps
Patrick
Opal | Level 21

@ScottBass

Being a long term SAS user and coder I've had same as you my struggles with DIS, especially the v3.x versions. DIS is certainly not perfect and I've got a whole wish-list for enhancements.

 

Even though you schedule the generated code DIS is always the tool for any job maintenance and changing the code outside of DIS breaks the concept. It's certainly nothing SAS has ever recommended.

 

Like all tools DIS has its use cases. I'm in my professional life encountering more and more often projects with end-to-end data lineage requirements - which are often driven by regulatory requirements. Using DIS (or any other metadata driven ETL tool) makes it much much easier to meet such requirements.

 

As for the "garbage code": Using DIS you most of the time won't examine the code as a whole. As @LinusH writes there are some options to reduce generated bits. Whenever I'm at a new client site first thing I do is deactivate the option which is set to generate these performance macros by default when creating a new job. 

 

And last but not least: Most of the "garbage code" adds some checks, generates many lines of code but won't take up a lot of processing time (excluded the select count(*) for row counts which some transformations generate and which can be hard to turn off).

ScottBass
Rhodochrosite | Level 12

Hi @Patrick @LinusH

 

Are there any options in DIS that will allow my colleague to export only the relevant code that actually "does stuff"?  IOW, I have the 95,000 line existing program, and the 3,000 line new and improved program, and they both result in the same output.

 

Note my OP is asking if there are any options in DIS.

 

 I have to take the actually useful bits of this code and import it into a new program.  Trolling through the 95K line program is onerous at best.

 

This is not suggesting I edit the generated code outside DIS.  What I'm trying to say is I'm reviewing the generated 95K line program, and trying to extract the relevant bits to include in another program outside DIS, i.e. Enterprise Guide or SAS batch (.sas) file.  And the signal-to-noise ratio of the generated DIS code is low; I'm having to scroll over thousands of lines of completely dead code, especially macro variables that are never referenced.  And if a human wrote code like this, they'd be fired.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
LinusH
Tourmaline | Level 20

Still arguing are we...

DI Studio code is not meant to be as efficient as possible when it comes to code reading by humans, it's optimized to run in a production environment.

That said, of course reading the code from time to time is crucial to understand the behaviour, but that usually limited to specific steps/transformations.

You can also compare to other ETL tools like Informatica and Data Stage, that is not generating code at all - what do you prefer?

 

To the question, yes there are options that you can turn on/off to limit the code generated. But, some of them might make your flow execute incorrectly. Example would be to limit the no of macro variables created for User Written transformations/Code, and if you actually use them...

 

The obvious ones are Collect Statistics, and Enable Parallel Processing Macros (both available as general options, and per job).

Data never sleeps

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1872 views
  • 3 likes
  • 3 in conversation