Hi ,
As all you know from DI studio is possible manually generate SAS code from each ETL or ETLs transformation.
Is it possible, for example by calling some metadata function etc., automatically ganarate such sas codes?
I mean, for example, I have Jobs or transformations meta ID(or URI) and I just put this ID(or URI) as parameter into some metadata function and this function output appropriate sas code into some file on server etc.?
Thanks!
What you basically try to do is re-writing the deployment "scripts" that come with DIS. That sounds like a very hard task. Is it worth it?
There is in DIS4.3 this option of batch deployment http://support.sas.com/documentation/cdl/en/etlug/63360/HTML/default/viewer.htm#p1jxhqhaz10gj2n1pyr0...
Couldn't you just firstly run such a script for all jobs (job names retrieved via metadata query) in both environments and then compare the generated SAS code (excluding things like metadata id's which always will differ).
It also shouldn't be too hard to exclude job pre-processing stuff as all steps in DI generated code have clear headers (so you just exclude the section under the job pre-process header).
Another way to go (also in DIS4.3): Before starting any new development cycle a DEV and TEST environment needs to get "baselined" with production code/metadata objects. There is now archiving with version control in DIS4.3 http://support.sas.com/documentation/cdl/en/etlug/63360/HTML/default/viewer.htm#n1tsueilsjwdgin1oove...
So you baseline everything first, then you archive this baseline. Once this is done you can use the compare function: http://support.sas.com/documentation/cdl/en/etlug/63360/HTML/default/viewer.htm#p03yoesrda5ldqn16r0b...
Not sure if there is already an automated report to list all differences for all jobs - but if there isn't then it's always worth to ask SAS TechSupport if there is something like this already around which can be handed out on request.
Just thinking loud...
Maybe you can, I don't know...
But why, what is the the requirement?
DI jobs aren't generally updated automatically, so a process where the developer (in development) deploys a job after a change, and the administrator who imports job to test/prod will do the deploy in the same process.
If you are uncertain that all jobs are deployed after the last update/import, you could deploy all jobs.
Hi Linus,
The task is - compare ETLs on different meta servers.
We need such tool becose there are huge amount of Jobs(ETLs) on production and test servers, and very often we need quick answer - if job that actually are scheduled and deployed on production is the same as job on some test environment.
ETLs on test servers are not deployed, so I can't access to direct ETL code as on production.
I've already spend a lot of time on creating macro that compare ETLs, this macro connect to meta server using xml maps and proc metadata etc., such approach in few times quicker then use data step metadata functions, but the problem is that there are a lot of different transformations inside of each job, not only standard Join extarct etc., also some self-made transforms created for achiving different goals.Each of this transform except standard meta-subobjects(mapping columns options etc.) also can has it own options properties etc., so it's very hard predict everything and create macro that will check everything in each ETLs transform.
At the moment this my macro on 90% cases give correct answer if ETLs match, but ,as I said before, - there are a lot of transforms with it own meta-structure and it will take a lot of time write xmlmaps for each of unique transformation...So I just supposed that it will be great if I'll have possibility ganarate ETLs or transformations sas code "on fly" and then just compare result sas code using some perl etc...
I suppose if it possible to generate needed sas code from DI studio - should be some possibility perform same operation outside of DI Studio:), for example from some macro.
Thanks!
How quick is quick?
And it's sounds quite complicated, and therefore not very reliable. Without deep knowledge of the requirements I still think that a process that includes deployment of changed/imported jobs would work good enough (?)
If you can settle for a a byte-wise comparison (and have just a boolean comparison result), you could have a batch process that exports all jobs as package files, and then you could compare i.e. test and prod package files.
Hi again Linus,
Regarding your first question(How quick is quick?) - for avarage ETL(~20 transforms etc.) macro executes for a 2-3 seconds, depends on current load(capacity) of both servers. As I wrote in my previous posts for this time macro checks and compare majority ETLs sub-macro objects: Transforms names,descriptions, mapping, columns expressions, user written codes, etc...
Plus I plan optimize my macro by adding functionality that skip all rest macro code if any first differense found(in another word find only first difference between ETLs).
Regarding your next proposal about soultion with deployment etc.
This way that you described in previous post isn't ok in my case, just becouse developers can edit ETLs on test servers any time(and they do it), but deployed sas code will be stored somewhere on server regulary, for example each morning. Also deploing one ETL isn't performed for a millisecond, in my case same ETL on ~20 transforms deploys on one server more time then executes this my macro, but, obviously I'll need make sas code from both servers so we can multiply this deploing time as minimum on 2.
Also one thing why variant with using deployed files isn't acceptable - due some reazons we have a lot of metadata JOBS that factually have sas code different from appropriate deployed sas code stored on server(and used by flows), I'm not responisble for this part of work but such situations happens, so shortly - I'm interesting if there are possibility ganarate sas code on fly, not use some pre-ganarated in the morning code(becouse ETLs can be changed on test servers after launch:) ).
Regharding your proposal related with byte-wise comparison .
I'm not sure that two factually same jobs will ganarate same byte code in .spk package , below I'll try to describe why.
In my current macro a few first steps allows very quick identify if two jobs are absloutely the same on two diff. metaservers.
1.Connect to first metaserver.
2.Using proc matadata with GetMatadataObjects XML request for extracting Jobs info in anoter XML file.
3.Connect to second metaserver.
4.Using proc matadata with same GetMatadataObjects XML request for extracting Jobs info in anoter XML file.
5.Compare XML files, BUT - before this delete ID strings from XML file becosue meta IDs forms sequantaly on diff metservers depends one existing IDs.
Also I can make some change in ETL, and then rall it back, so factually ETLs will stay teh same, but ORDER of metaobjects will change in output XML files, same thing, I suppose, will cause difference between two .spk packages ganarated from factually same ETLs.
Plus I have requirment that job pre-processing code value should be ignored during comparsion, so all these thing causes the fact that I can't use byte-wise comparison of imported from diff metaservers packages.
In ideal variant ,as I wrote before, I'd like to have possibility ganarate sas code from one ETLs transformation "on fly",ecranize this code - skip meta IDs if such presented in gaenerated code(I'm not sure about this) etc., compare this code with appropriate code from another meta server, if find difference - alert needed message and stop executing, if sas codes match - go to next transform , ganarate it code etc.
Thanks!
What you basically try to do is re-writing the deployment "scripts" that come with DIS. That sounds like a very hard task. Is it worth it?
There is in DIS4.3 this option of batch deployment http://support.sas.com/documentation/cdl/en/etlug/63360/HTML/default/viewer.htm#p1jxhqhaz10gj2n1pyr0...
Couldn't you just firstly run such a script for all jobs (job names retrieved via metadata query) in both environments and then compare the generated SAS code (excluding things like metadata id's which always will differ).
It also shouldn't be too hard to exclude job pre-processing stuff as all steps in DI generated code have clear headers (so you just exclude the section under the job pre-process header).
Another way to go (also in DIS4.3): Before starting any new development cycle a DEV and TEST environment needs to get "baselined" with production code/metadata objects. There is now archiving with version control in DIS4.3 http://support.sas.com/documentation/cdl/en/etlug/63360/HTML/default/viewer.htm#n1tsueilsjwdgin1oove...
So you baseline everything first, then you archive this baseline. Once this is done you can use the compare function: http://support.sas.com/documentation/cdl/en/etlug/63360/HTML/default/viewer.htm#p03yoesrda5ldqn16r0b...
Not sure if there is already an automated report to list all differences for all jobs - but if there isn't then it's always worth to ask SAS TechSupport if there is something like this already around which can be handed out on request.
Just thinking loud...
Hi Patric,
Lets go throw your proposals(questions) step by step:)(I'll mark your text in bold).
1.
What you basically try to do is re-writing the deployment "scripts" that come with DIS. That sounds like a very hard task. Is it worth it?
Probably yes, it worth it because on Friday I finished first version of this set of macros that compare ETLs and in majority of tests(I made a lot) macro return correct result, I mean it shows correct difference between jobs or the fact that there were no found any difference BUT SEARCH were performed ONLY by these and these meta objects(Transformations name desc uw codes, all transformations options, where group having join clauses,mapping, columns, expressions etc.).
Plus the macro runs a few seconds(avg time for avg ETL) so it exactly what customers wonted.
I understand that difference can be stored in very very "fare" meta sub-object but I created the macro in such way that it very simple add new rules(things to compare) so it(macro) will be improved time after time.
Couldn't you just firstly run such a script for all jobs (job names retrieved via metadata query) in both environments and then compare the generated SAS code (excluding things like metadata id's which always will differ).
It also shouldn't be too hard to exclude job pre-processing stuff as all steps in DI generated code have clear headers (so you just exclude the section under the job pre-process header).
It's very good idea, actually when I started work on the task I thought about possibility compare deployed ETLs codes, but we lived in no-ideal world))) and there are a lot of reasons that will not allow me achieve needed goals using such solution.
Fist of all deploy all jobs on two meta servers, for example, each night will take a huge amount of time, I'm almost sure - the party will сщтештгу till the mornings:)
I am not responsible person for this part of work but I have the tasks with flows and I made deployment(generate code) of few ETLs, - it take a few seconds for each ETL, and what if we have few thousandth of ETLs?Few thousandth multiply on few seconds will be few hours, actually I am not sure that our system administrator, dba will accept such approach.
When we move new jobs to production (or new version of job)we made partial promotion, so all rest ETLs staid the same, without changes.
Next reason why this variant isn't ok - our developers can edit some ETLs , so morning deployed code will not be actual on the afternoon, outsource we can check in metadata timestamps if the job were not changed from the morning, but if it is, we would not be able compare it with job from production, for example.
And there are also few another minor things(deployment on test servers, difference between deployed code and real ETL on production(don't ask me why)) that will not allow me compare ETLs deployed code.
But in general your idea is probably the only one correct that give 100% correct reliable result, but unfortunately I can't use such approach...
Another way to go (also in DIS4.3): Before starting any new development cycle a DEV and TEST environment needs to get "baselined" with production code/metadata objects. There is now archiving with version control in DIS4.3
Actually I'm at home and can't check version of DI studio, but we still use SAS 9.1.3 and moving to next SAS version postponed for now because our data warehouse is huge, a lot of ETLs, tables , flows etc.
I suppose DI studio 4.3 comes with new version of SAS(probably 9.3)...
But this solution is also interesting, I'' look into it more deep when I'll come from vacations.
Thanks a lot, you with Linus gave really good advises, I'll try propose something from them to customers if actual variant will gave unreliable or wrong results..
Thanks one more time.
One more thought.
having developers editing jobs in test environment (when there exists a development environment) can't be seen as best practice.
Can't you revoke this permission for them, and streamline the promotion process?
Hi Linus,
Off-course our developers edit ETLs only on development env., test server should be mirror of production so we promote(from development) new(or changed) ETLs to test environment before Jobs moved to production. Only sometimes developers can change pre-processing code on test env. in order to check(test) if separate ETLs runs ok on test server also.
I suppose it general approach of work in case of big data warehouse.
Actually you with Patric wrote me similar approach to compare ETLs, but as I wrote before there a lot of reasons why I can't use batch deployment method...
But I still have small hope that if batch deployment possible(Patric sent me a few links, I'll investigate them after vacation) - probably one job deployment also can be forced in needed moment by some set of operations.
I can't use X command (that can execute some .bat file for example) on production but I can do it on dev server, I am not sure but suppose it can also cause some problem in case I will wont generate this ETLs code on fly from production environment ETL.
I also have another (the smallest) hope that there is some possibility generate(on fly) sas code from separate ETLs transformation, it will speed-up execution time if ETLs have difference...
Thanks!
If you're using SAS 9.1.3 then the DIS version is 3.4 - and so very outdated. Both the automated job deployment and archiving are not available in this version.
You might get different code generated using the same SAS metadata because it's actually the DIS client generating the code and different patch levels of the client can generate different code (newer patches generating improved code). I've seen that happen in real life.
What I've been using in the past for SAS DI implementations is a more manual process:
- A developer gets a task assigned (build, change, or fix). Who gets what when and so on is part of project management and the tech lead keeps the overview that this works.
- The developer implements & unit tests and adds all the changed objects to a migration spreadsheet (who changed which object when inclusive the task number).
- This migration spreadsheet is then used to bundle objects for a new release (DEV -> TEST -> PROD).
It feels very much that the task you've got is someone trying to solve a management problem with a technical solution.
Hi Patric,
We usually use such approach in tasks with smaller amount of ETLs(sub set of all Jobs), may be it's really good idea to use such spreadsheet to all our ETLs, I'll discuss this with coworkers, thanks!
I think I can close this discussion, but I'll create another one related with comparing XML files created by proc metadata, I've found some small issue related with changing order of tegs in this files, but I'll describe it more in detail in next discussion.
Thanks!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.
