BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Isator
Fluorite | Level 6

Hi all

 

I am not a SAS-developer/user, I am just a java/.NET-developer currently aiding SAS-users at my company to integrate SAS-data with some custom apps.

Lately a SAS-user approched me asking if it is possible to extract data lineage from .sas-files, ie. is it possible to find out what a .sas-file reads and what it produces, programatically. We know most of the .sas-files are used as ETL processes, typically reading a .csv-file and writing to a database table.

My question is the following; is it possible to use some API's to extract such information from a .sas-file? I also heared about SAS Metadata Server and repositories. Can I found something there?

 

Kind regards

Isa

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20

There's someting calld PROC SCAPROC, that analyzes SAS programs, and yes, it will get you some lineage.

If your site has a license for Data Integration, you could use a GUI (SAS COde Importer) to import the results to the metadata server, both tables, jobs and job steps.

Data never sleeps

View solution in original post

12 REPLIES 12
LinusH
Tourmaline | Level 20

There's someting calld PROC SCAPROC, that analyzes SAS programs, and yes, it will get you some lineage.

If your site has a license for Data Integration, you could use a GUI (SAS COde Importer) to import the results to the metadata server, both tables, jobs and job steps.

Data never sleeps
Isator
Fluorite | Level 6

Ok, so we are talking about two possibilities here.

 

--There's someting calld PROC SCAPROC, that analyzes SAS programs, and yes, it will get you some lineage.

When you say PROC SCAPROC will produce some lineage can it get me what I am asking for, ie. name of the source csv-file and target csv-file/SQL? If so, how does this procedure return such information (returning objects, write response to a file)?

 

--If your site has a license for Data Integration, you could use a GUI (SAS COde Importer) to import the results to the metadata server, both tables, jobs and job steps.

Is it difficult getting source/target files in this case? This is GUI but can the process of importing to metadata server be automated?

 

Isa

 

 

 

 

LinusH
Tourmaline | Level 20

SCAPROC returns the result as a report/log, plaint text. You can read mode about it on support.sas.com.

 

Automated, what should be automated, the import process, or the resulting Job in metadata?

Generally, if you don't already have this license, I don't think it's worth getting one for this sole purpose. It's full flavoured ETL tool, for division/enterprise wide implementations.

Getting the source/target files will in this scenario be "easy", since they will be registered in the metadata, and therefore traceable using metadata tools (such as DI Studio).

Data never sleeps
Patrick
Opal | Level 21

It's one of these "it depends" things.

 

If these are ETL processes then my first question would be:

1. Have you done a proper design including a full source to target data mapping?

2. How have you implemented? - Just writing SAS code or via SAS DI Studio which is Metadata driven and where you could create lineage reports and which also provides a tool for "re-aligning" variables.

 

Given your role and background you're eventually not the right person to be asked this question. In my experience such a question comes sometimes up if there is a high level "full data lineage" requirement but developers didn't follow a clean implementation process and then try to sort out "the mess" programmatically instead of putting in the hard design and documentation work.

 

Isator
Fluorite | Level 6

Hi,

 

We are a smaller company without a professional team of SAS developers/super users.

Our company is using purely .sas-files containing handwritten SAS statements, not sure if there are writen in Enterprise Guide or some other editors.

 

I am totaly agains quick fix solution (like parsing sas-files even they look not too difficult, at least in our company), hence my question on this forum. Previously I did lineage integration with other BI tools like IBM Information Server (a module called Governance Catalogue contains all metadata and can be queried via REST-services or command-line) and QlikView (an app created by same vendor called Governance Dashboard that spits out flat files containg the data lineage). Both platforms gave me clean info regarding ETL job target/source types, their properties, names, etc. and thus the full data lineage integration was easy to accomplish. And this was automatic/dynamic as documentation is seldomly in sync with the implementation.

 

Kind regards

Isa

 

Patrick
Opal | Level 21

The tools you mention use metadata. With SAS this would be things like SAS DI Studio for ETL's and all the BI clients. These tools are metadata driven and there you could get your lineage (not everything OOTB, but possible).

 

As I understand what has been done in your company is "hand-coding". That's "freeform" and not metadata driven. The only information you get there is in the SAS logs which you would have to parse in one way or the other, and... it depends what logging options have been turned on. You might not get sufficient information in the log even if you would implement to world's most clever parser.

 

In short: I believe in the environment you are in it's not possible to get full data lineage information programmatically.

Isator
Fluorite | Level 6

Thx for quick response.

 

IBM Info Server is purely graphical and yes, easier for the platform to pick on the metadata. QlikView projects however, used 4 gen language as these sas-files but still data lineage was possible. Sources in QV were imported via a wizard but not target files/databases, still it gave correct data lineage.

 

I hear what you are saying regarding logs. Would reather avoid parsing them but I did work with "black box" systems without any API's where such solution was the only answer. Are these logs writen to a file or in some repository?

I will also check the proc LinusH recomended..

 

Patrick
Opal | Level 21

SAS language is a mixture of 3rd, 4th and 5th generation. 

If run in batch the SAS logs are written to permanent text files with suffix .log.

 

And yes, the only way to get this lineage information is by parsing the logs.

 

PROC SCAPROC does some of this parsing for you and then writes the output to a text file. You need to call it explicitely when batch submitting a SAS program. What users get in SAS Enterprise Guide uses PROC SCAPROC.

 

PROC SCAPROC will make your life a bit easier but it's not the solution. 

 

My recommendation: Don't waste your time on it. With fully hand-coded SAS code there is no 100% programmatically way to get a data lineage report. It won't workl

Isator
Fluorite | Level 6
Ok, I will check both logs and proc before deciding whats next.

Btw, I didn't mention one important information. These sas-files are all run via command line (a 3rd party workflow engine has steps which can run coammnd line commands). I haven't yet seen how this looks like. If a sas-file failes, the workflow is notifyed somehow (I guess execution status code != 0). I don't know if these sas-files when executed via command line are able to produce some information back to command line window (like log info). Maybe some parameters can be set in the sas-file to make it produce output?


Kind regards
Isa
LinusH
Tourmaline | Level 20
Out of the box you get the RC and the log file. What else do you wish to return to the scheduler?

Again about lineage. If you just a few jobs, document it (manually). If that's too much job, you are a Data Integration Server prospect.
Data never sleeps
Isator
Fluorite | Level 6

New info, SAS developers did use Enterprise Guide. I read that Enterise Guide is also integrated with Metadata server. So this means we may be able to retrieve source/target metadata via Metadata server API's?

 

Kr

Isa 

Patrick
Opal | Level 21

Just believe what @LinusH and I are telling you. The EG integration into SAS Metadata is different from what you might think and it won't allow you to retrieve source to target metadata mapping from EG projects. This information is not stored in the SAS metadata server but in the actual client side EG project (.egp file). You can open such a .egp file with WinZip and then have a look into the XML's there. 

 

Once the developers store the SAS code developed in EG as external .sas files the EG project keeps only the references (links) to these external files. It's these SAS scripts which you run in batch via a 3rd party scheduler.

On top of it: You can write dynamic code with SAS so which tables get used how is eventually data driven and only determined during run-time (and could even differ from execution to execution). 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2555 views
  • 4 likes
  • 3 in conversation