SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Data Lineage - SAS Script

Reply
Occasional Contributor
Posts: 8

Data Lineage - SAS Script

Hello,

I am Java/SAS developer and I am building a tool that helps me do impact analysis, document the workflow process, generate a data dictionary.

How the tool works: I have many SAS scripts from my customers storend in tree folder.

The tool reads and parses all SAS scripts and generate a dependency sources graph.

Using some pre define tag stored in SAS Script, is possible to generate a dictionary about data.

So it help me too see the big picture of all SAS Scripts, how each sources dependens, do impact analysis, document the process and data. It was very useful to me.

I attached a screenshot from the tool. (Sem título - Images Host)

I would like to know if it make sense for other SAS Consultants? I would like to provide the tool for the community, just to see if it make sense.ive

Valued Guide
Posts: 3,208

Re: Data Lineage - SAS Script

The SAS metadata is working on a dictionary. Records / fields with their related DI code. Having all those relations there is an impact analyses under your hands.
Within SAS installations there a Workflow Studio tool.  SAS(R) 9.3 Intelligence Platform: Overview    What are you adding/changing to that?

---->-- ja karman --<-----
Occasional Contributor
Posts: 8

Re: Data Lineage - SAS Script

Imagine the situation that i have only SAS Scripts, dozen of them.

The business users, most of then, use EG to create their own SAS Scripts. Then these scripts start to be prodcutive (run periodically).

The final user dont want to pass this workflow to the IT people, which are using SAS Workflow Studio.

They want to have the flexibility to edit these SAS scripts.

So the business user are responsible to support these "ETL" processes, running using EG (SAS Scripts)

So the tool give a Big Picture, how the sources are connected, dictionary, documentantio...

Super User
Posts: 5,258

Re: Data Lineage - SAS Script

Sounds like you are reinventing the wheel, a little at least.

PROC SCAPROC parses SAS programs, and will report on what it does, which input and output data that is accessed.

From Enterprise Guide, you have a this as a Wizard.

 

If you have a lot of ETL jobs that are run methodically, ands needs maintenance, you should definitely be interested in Data INtegrations Server. In it's prime client DI Studio, you also have this wizard for importing SAS programs. And this will also import them into metadata, leveraging lineage, audit etc.

Data never sleeps
Contributor
Posts: 38

Re: Data Lineage - SAS Script

Hi,

if you are a Java developer and you have a metadata environment i would recommend to the SAS(R) 9.3 Open Metadata Interface: Reference and Usage,

instead of parsing .sas files.

Lately I have spent a lot of time in analyzing the sas metadata model.

I don’t want this thread to be an advertisement, but i built a java application

to easily readout sas metadata and format them visually.

If you are interested :
http://www.flitcon.de/metadataviewer/flitcon_metadata_viewer_brochure_english.pdf

kind regards from germany
Marius

Super User
Super User
Posts: 6,502

Re: Data Lineage - SAS Script

You will not get very far trying to parse SAS programs to generate a complete data dictionary.  You would have better luck processing the logs generated when the programs run.

If you are talking about Enterprise Guide projects instead of SAS programs then you might have access to more metadata about how the programs interact.

Valued Guide
Posts: 3,208

Re: Data Lineage - SAS Script

pedromagalhaes   You are describing several problems in the organization.  It would be better to try to solve those. Although you are not in the position to do that you could give a signal on that. When that is possible hurting you personally than there is bad enterprise culture. Make your thoughts plans etc.

What I am reading.

1- The business/users are programming developing building sas scripts.

  No problem for the domain-knowledge for the more real challenging ICT area's a challenge.

2- The business/users want to have flexibility to edit those scritps.

  That is release/change management (what version is in Production) and Version management (segregation of developers work). To have there auditable traceable processes in place is a common requirement by regulators.   Now there is nothing.

3- Scheduling is the common operations being productive normally supported by IT staff.

  There must be a gap with IT staff as the users want or must do it their self. That can be caused by rigid IT people not wanting to cooperate with the business. Still a gap problem.

4- Scheduling with SAS is best done with LSF (schedule manager) that is not the same as workflow studio.

  Mostly IT staff are having their own scheduling and holding off the SAS approach.  

5- As the business is running ETL processes why not using the SAS ETL tools for that?

  Once there was a product ETL-studio, it is named DI-studio these days. The WA-tool in SAS is even older as it was a predecessor having his own meta-database.                                             

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 6 replies
  • 993 views
  • 2 likes
  • 5 in conversation