Extracting metadata from .sas-files

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Extracting metadata from .sas-files

Hi all

 

I am not a SAS-developer/user, I am just a java/.NET-developer currently aiding SAS-users at my company to integrate SAS-data with some custom apps.

Lately a SAS-user approched me asking if it is possible to extract data lineage from .sas-files, ie. is it possible to find out what a .sas-file reads and what it produces, programatically. We know most of the .sas-files are used as ETL processes, typically reading a .csv-file and writing to a database table.

My question is the following; is it possible to use some API's to extract such information from a .sas-file? I also heared about SAS Metadata Server and repositories. Can I found something there?

 

Kind regards

Isa


Accepted Solutions
Solution
‎10-19-2015 11:36 AM
Super User
Posts: 5,256

Re: Extracting metadata from .sas-files

There's someting calld PROC SCAPROC, that analyzes SAS programs, and yes, it will get you some lineage.

If your site has a license for Data Integration, you could use a GUI (SAS COde Importer) to import the results to the metadata server, both tables, jobs and job steps.

Data never sleeps

View solution in original post


All Replies
Solution
‎10-19-2015 11:36 AM
Super User
Posts: 5,256

Re: Extracting metadata from .sas-files

There's someting calld PROC SCAPROC, that analyzes SAS programs, and yes, it will get you some lineage.

If your site has a license for Data Integration, you could use a GUI (SAS COde Importer) to import the results to the metadata server, both tables, jobs and job steps.

Data never sleeps
Occasional Contributor
Posts: 6

Re: Extracting metadata from .sas-files

Ok, so we are talking about two possibilities here.

 

--There's someting calld PROC SCAPROC, that analyzes SAS programs, and yes, it will get you some lineage.

When you say PROC SCAPROC will produce some lineage can it get me what I am asking for, ie. name of the source csv-file and target csv-file/SQL? If so, how does this procedure return such information (returning objects, write response to a file)?

 

--If your site has a license for Data Integration, you could use a GUI (SAS COde Importer) to import the results to the metadata server, both tables, jobs and job steps.

Is it difficult getting source/target files in this case? This is GUI but can the process of importing to metadata server be automated?

 

Isa

 

 

 

 

Super User
Posts: 5,256

Re: Extracting metadata from .sas-files

SCAPROC returns the result as a report/log, plaint text. You can read mode about it on support.sas.com.

 

Automated, what should be automated, the import process, or the resulting Job in metadata?

Generally, if you don't already have this license, I don't think it's worth getting one for this sole purpose. It's full flavoured ETL tool, for division/enterprise wide implementations.

Getting the source/target files will in this scenario be "easy", since they will be registered in the metadata, and therefore traceable using metadata tools (such as DI Studio).

Data never sleeps
Respected Advisor
Posts: 3,887

Re: Extracting metadata from .sas-files

It's one of these "it depends" things.

 

If these are ETL processes then my first question would be:

1. Have you done a proper design including a full source to target data mapping?

2. How have you implemented? - Just writing SAS code or via SAS DI Studio which is Metadata driven and where you could create lineage reports and which also provides a tool for "re-aligning" variables.

 

Given your role and background you're eventually not the right person to be asked this question. In my experience such a question comes sometimes up if there is a high level "full data lineage" requirement but developers didn't follow a clean implementation process and then try to sort out "the mess" programmatically instead of putting in the hard design and documentation work.

 

Occasional Contributor
Posts: 6

Re: Extracting metadata from .sas-files

Hi,

 

We are a smaller company without a professional team of SAS developers/super users.

Our company is using purely .sas-files containing handwritten SAS statements, not sure if there are writen in Enterprise Guide or some other editors.

 

I am totaly agains quick fix solution (like parsing sas-files even they look not too difficult, at least in our company), hence my question on this forum. Previously I did lineage integration with other BI tools like IBM Information Server (a module called Governance Catalogue contains all metadata and can be queried via REST-services or command-line) and QlikView (an app created by same vendor called Governance Dashboard that spits out flat files containg the data lineage). Both platforms gave me clean info regarding ETL job target/source types, their properties, names, etc. and thus the full data lineage integration was easy to accomplish. And this was automatic/dynamic as documentation is seldomly in sync with the implementation.

 

Kind regards

Isa

 

Respected Advisor
Posts: 3,887

Re: Extracting metadata from .sas-files

[ Edited ]

The tools you mention use metadata. With SAS this would be things like SAS DI Studio for ETL's and all the BI clients. These tools are metadata driven and there you could get your lineage (not everything OOTB, but possible).

 

As I understand what has been done in your company is "hand-coding". That's "freeform" and not metadata driven. The only information you get there is in the SAS logs which you would have to parse in one way or the other, and... it depends what logging options have been turned on. You might not get sufficient information in the log even if you would implement to world's most clever parser.

 

In short: I believe in the environment you are in it's not possible to get full data lineage information programmatically.

Occasional Contributor
Posts: 6

Re: Extracting metadata from .sas-files

Thx for quick response.

 

IBM Info Server is purely graphical and yes, easier for the platform to pick on the metadata. QlikView projects however, used 4 gen language as these sas-files but still data lineage was possible. Sources in QV were imported via a wizard but not target files/databases, still it gave correct data lineage.

 

I hear what you are saying regarding logs. Would reather avoid parsing them but I did work with "black box" systems without any API's where such solution was the only answer. Are these logs writen to a file or in some repository?

I will also check the proc LinusH recomended..

 

Respected Advisor
Posts: 3,887

Re: Extracting metadata from .sas-files

[ Edited ]

SAS language is a mixture of 3rd, 4th and 5th generation. 

If run in batch the SAS logs are written to permanent text files with suffix .log.

 

And yes, the only way to get this lineage information is by parsing the logs.

 

PROC SCAPROC does some of this parsing for you and then writes the output to a text file. You need to call it explicitely when batch submitting a SAS program. What users get in SAS Enterprise Guide uses PROC SCAPROC.

 

PROC SCAPROC will make your life a bit easier but it's not the solution. 

 

My recommendation: Don't waste your time on it. With fully hand-coded SAS code there is no 100% programmatically way to get a data lineage report. It won't workl

Occasional Contributor
Posts: 6

Re: Extracting metadata from .sas-files

Ok, I will check both logs and proc before deciding whats next.

Btw, I didn't mention one important information. These sas-files are all run via command line (a 3rd party workflow engine has steps which can run coammnd line commands). I haven't yet seen how this looks like. If a sas-file failes, the workflow is notifyed somehow (I guess execution status code != 0). I don't know if these sas-files when executed via command line are able to produce some information back to command line window (like log info). Maybe some parameters can be set in the sas-file to make it produce output?


Kind regards
Isa
Super User
Posts: 5,256

Re: Extracting metadata from .sas-files

Out of the box you get the RC and the log file. What else do you wish to return to the scheduler?

Again about lineage. If you just a few jobs, document it (manually). If that's too much job, you are a Data Integration Server prospect.
Data never sleeps
Occasional Contributor
Posts: 6

Re: Extracting metadata from .sas-files

New info, SAS developers did use Enterprise Guide. I read that Enterise Guide is also integrated with Metadata server. So this means we may be able to retrieve source/target metadata via Metadata server API's?

 

Kr

Isa 

Respected Advisor
Posts: 3,887

Re: Extracting metadata from .sas-files

Just believe what @LinusH and I are telling you. The EG integration into SAS Metadata is different from what you might think and it won't allow you to retrieve source to target metadata mapping from EG projects. This information is not stored in the SAS metadata server but in the actual client side EG project (.egp file). You can open such a .egp file with WinZip and then have a look into the XML's there. 

 

Once the developers store the SAS code developed in EG as external .sas files the EG project keeps only the references (links) to these external files. It's these SAS scripts which you run in batch via a 3rd party scheduler.

On top of it: You can write dynamic code with SAS so which tables get used how is eventually data driven and only determined during run-time (and could even differ from execution to execution). 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 649 views
  • 4 likes
  • 3 in conversation