SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

automatically update metadata in DIStudio after import changed physical dataset via batch job possible?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

automatically update metadata in DIStudio after import changed physical dataset via batch job possible?

Currently, we are migrating from unix to windows and at the same time from Base SAS to DIStudio.

Datasets that are created with batch jobs in Base SAS on Unix, are copied as a txt-file with an export batchjob to a folder on windows.

Then in DIStudio, we have an import batchjob that copies this txt-file from the folder into a folder as a SAS-dataset on the SAS Application server.

This export and import job run every hour and it works fine.

Sometimes, one of the datasets changes. eg add a column, delete a column, add an index, delete an index,...

This changes are copied to the physical datasets for DIStudio but metadata are not automatically updated. This means, that each time when something is changed, I have to open DIStudio and manually update metadata for the table.

I am looking for a way to do this automatically. A kind of proc that I can add to our import job so that after the import of the physical datastes, the metadata of all the datasets from a certain library are update automatically. Does such a proc exists? Or how can I solve this issue?

Many thanks!


Accepted Solutions
Solution
‎01-05-2012 11:33 AM
SAS Employee
Posts: 11

automatically update metadata in DIStudio after import changed physical dataset via batch job possible?


All Replies
Solution
‎01-05-2012 11:33 AM
SAS Employee
Posts: 11

automatically update metadata in DIStudio after import changed physical dataset via batch job possible?

Respected Advisor
Posts: 4,173

Re: automatically update metadata in DIStudio after import changed physical dataset via batch job possible?

You can of course use PROCs or Data Step functions as documented under http://support.sas.com/documentation/cdl/en/lrmeta/63180/HTML/default/viewer.htm#titlepage.htm to alter metadata.

But in regards of DI jobs I'm not so sure that this is a good approach because it's not only about the table metadata but also about all the mapping of columns and columns derived from the ones changing. If you aply metadata changes automatically you risk that your DI jobs get unstable.

I believe that metadata for DI jobs should be kept stable and if something changes then it should be part of a change process (SDLC with full re-testing) as you're actually changing productive code (inclusive the need of re-deployment).

May be you have to re-think if it makes sense to migrate all code into DIS. If these are more Adhoc projects then may-be EG is sufficient - or if DIS is a must then eventually to just copy-paste the SAS code into DIS code nodes is good enough. Just also drag the used tables into the project and link them as source and target to the code node (not for using the metadata as such but for accurate impact analysis on table level).

By the way: Using SMC you can update all table metadata for a whole library in one go. But you still have to re-test and re-deploy all affected DI jobs.

Occasional Contributor
Posts: 12

automatically update metadata in DIStudio after import changed physical dataset via batch job possible?

Thanks for your answers Mike and Patrick!

I made a job with the following as self-written code:


ods listing close;
ods html body="\\...\update.html";

proc metalib;
   omr (library="...");
   update_rule(delete);
   report (matching type=detail);
   impact_limit=0;
   noexec;
run;
ods html close;
ods listing;

Actually, I copied and combined example 2 and example 4 from the metablib procedure.

When I omit 'impact_limit=0', it all works fine. But when I add 'impact_limit=0', the job is running for hours and does not get finished, although there are only 5 datasets in my library. What could be the cause this program is infinitely running?

Respected Advisor
Posts: 4,173

automatically update metadata in DIStudio after import changed physical dataset via batch job possible?

Not sure but may be you would need the "report/impact" statements in another run group than the "update_rule(delete)".

Another thing which comes to mind:

Using "delete" as update_rule means that if a physical table is not around the corresponding metadata table gets deleted. So if there is ever some issue with physical tables (eg. they got by some issue deleted but not re-created) then the metadata table will get deleted - and all DI jobs using this table will be corrupted. And as objects in DI jobs are referenced over metadata id and not over metadata names simply re-creating the metadata tables won't solve the problem.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 870 views
  • 3 likes
  • 3 in conversation