Hi all
I am not a SAS-developer/user, I am just a java/.NET-developer currently aiding SAS-users at my company to integrate SAS-data with some custom apps.
Lately a SAS-user approched me asking if it is possible to extract data lineage from .sas-files, ie. is it possible to find out what a .sas-file reads and what it produces, programatically. We know most of the .sas-files are used as ETL processes, typically reading a .csv-file and writing to a database table.
My question is the following; is it possible to use some API's to extract such information from a .sas-file? I also heared about SAS Metadata Server and repositories. Can I found something there?
Kind regards
Isa
There's someting calld PROC SCAPROC, that analyzes SAS programs, and yes, it will get you some lineage.
If your site has a license for Data Integration, you could use a GUI (SAS COde Importer) to import the results to the metadata server, both tables, jobs and job steps.
There's someting calld PROC SCAPROC, that analyzes SAS programs, and yes, it will get you some lineage.
If your site has a license for Data Integration, you could use a GUI (SAS COde Importer) to import the results to the metadata server, both tables, jobs and job steps.
Ok, so we are talking about two possibilities here.
--There's someting calld PROC SCAPROC, that analyzes SAS programs, and yes, it will get you some lineage.
When you say PROC SCAPROC will produce some lineage can it get me what I am asking for, ie. name of the source csv-file and target csv-file/SQL? If so, how does this procedure return such information (returning objects, write response to a file)?
--If your site has a license for Data Integration, you could use a GUI (SAS COde Importer) to import the results to the metadata server, both tables, jobs and job steps.
Is it difficult getting source/target files in this case? This is GUI but can the process of importing to metadata server be automated?
Isa
SCAPROC returns the result as a report/log, plaint text. You can read mode about it on support.sas.com.
Automated, what should be automated, the import process, or the resulting Job in metadata?
Generally, if you don't already have this license, I don't think it's worth getting one for this sole purpose. It's full flavoured ETL tool, for division/enterprise wide implementations.
Getting the source/target files will in this scenario be "easy", since they will be registered in the metadata, and therefore traceable using metadata tools (such as DI Studio).
It's one of these "it depends" things.
If these are ETL processes then my first question would be:
1. Have you done a proper design including a full source to target data mapping?
2. How have you implemented? - Just writing SAS code or via SAS DI Studio which is Metadata driven and where you could create lineage reports and which also provides a tool for "re-aligning" variables.
Given your role and background you're eventually not the right person to be asked this question. In my experience such a question comes sometimes up if there is a high level "full data lineage" requirement but developers didn't follow a clean implementation process and then try to sort out "the mess" programmatically instead of putting in the hard design and documentation work.
Hi,
We are a smaller company without a professional team of SAS developers/super users.
Our company is using purely .sas-files containing handwritten SAS statements, not sure if there are writen in Enterprise Guide or some other editors.
I am totaly agains quick fix solution (like parsing sas-files even they look not too difficult, at least in our company), hence my question on this forum. Previously I did lineage integration with other BI tools like IBM Information Server (a module called Governance Catalogue contains all metadata and can be queried via REST-services or command-line) and QlikView (an app created by same vendor called Governance Dashboard that spits out flat files containg the data lineage). Both platforms gave me clean info regarding ETL job target/source types, their properties, names, etc. and thus the full data lineage integration was easy to accomplish. And this was automatic/dynamic as documentation is seldomly in sync with the implementation.
Kind regards
Isa
The tools you mention use metadata. With SAS this would be things like SAS DI Studio for ETL's and all the BI clients. These tools are metadata driven and there you could get your lineage (not everything OOTB, but possible).
As I understand what has been done in your company is "hand-coding". That's "freeform" and not metadata driven. The only information you get there is in the SAS logs which you would have to parse in one way or the other, and... it depends what logging options have been turned on. You might not get sufficient information in the log even if you would implement to world's most clever parser.
In short: I believe in the environment you are in it's not possible to get full data lineage information programmatically.
Thx for quick response.
IBM Info Server is purely graphical and yes, easier for the platform to pick on the metadata. QlikView projects however, used 4 gen language as these sas-files but still data lineage was possible. Sources in QV were imported via a wizard but not target files/databases, still it gave correct data lineage.
I hear what you are saying regarding logs. Would reather avoid parsing them but I did work with "black box" systems without any API's where such solution was the only answer. Are these logs writen to a file or in some repository?
I will also check the proc LinusH recomended..
SAS language is a mixture of 3rd, 4th and 5th generation.
If run in batch the SAS logs are written to permanent text files with suffix .log.
And yes, the only way to get this lineage information is by parsing the logs.
PROC SCAPROC does some of this parsing for you and then writes the output to a text file. You need to call it explicitely when batch submitting a SAS program. What users get in SAS Enterprise Guide uses PROC SCAPROC.
PROC SCAPROC will make your life a bit easier but it's not the solution.
My recommendation: Don't waste your time on it. With fully hand-coded SAS code there is no 100% programmatically way to get a data lineage report. It won't workl
New info, SAS developers did use Enterprise Guide. I read that Enterise Guide is also integrated with Metadata server. So this means we may be able to retrieve source/target metadata via Metadata server API's?
Kr
Isa
Just believe what @LinusH and I are telling you. The EG integration into SAS Metadata is different from what you might think and it won't allow you to retrieve source to target metadata mapping from EG projects. This information is not stored in the SAS metadata server but in the actual client side EG project (.egp file). You can open such a .egp file with WinZip and then have a look into the XML's there.
Once the developers store the SAS code developed in EG as external .sas files the EG project keeps only the references (links) to these external files. It's these SAS scripts which you run in batch via a 3rd party scheduler.
On top of it: You can write dynamic code with SAS so which tables get used how is eventually data driven and only determined during run-time (and could even differ from execution to execution).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.