BookmarkSubscribeRSS Feed

A quick guide for choosing the right technology for data integration

Started ‎09-17-2014 by
Modified ‎10-06-2015 by
Views 3,634

For large data management projects it’s not uncommon for customers to deliver long and detailed requirements for data access and data integration. Data source access options for SAS data management offerings are plentiful and well documented but information on data integration options—the means for software applications to communicate and share data—are not as easy to come by (at least not all in one place).

 

So let’s forget about data access options for now (you have lots of them!) and focus on data integration. SAS customers are fortunate to have several data management applications to choose from that each offer a distinct set of features and play well with each other and other non-SAS systems. Choosing the right technology for your data management project will depend on, among other things, the integration options provided by each. Sometimes a combination of data management applications might be needed to match the requirements of the project and the varied skills of potential users. Here’s a quick guide to help you make the right technology choice for your project.

 

SAS Data Management Platform

The Data Management Platform (DMP) includes Data Management Studio, Data Management Server, and several other additional modules and content libraries. Taken together, these components provide a robust environment for developing and deploying data quality-centric processes. Unique to this environment is access to sophisticated data profiling, data quality and data enrichment algorithms; the ability to deploy processes in batch and real-time modes; and an adroitness with heterogeneous IT environments. Here are just a few of the data integration features of this solution:

  • SAS program submittal
  • SAS metadata use for users and groups only
  • Message queue communication
  • Web Service communication (SOAP, RESTful through HTTP)
  • Web service deployment of DMP real-time data services and web service access to DMP batch jobs, process jobs, and profile jobs
  • DMP real-time data service access on alternate Data Management Servers
  • Command line execution (both of DMP processes and from within DMP processes)
  • Email delivery and FTP access
  • Java program interaction

 

SAS Data Integration Studio

Data Integration Studio is visual design tool for building and deploying data integration processes. Its distinctive features are its SAS code generation underpinnings, a multitude of built-in transformations, close integration with SAS metadata and lots of other enterprise ETL capabilities. Much of what you can do in Data Integration Studio can be done with manual SAS programming but you get little of the code manageability and none of the rich graphical user interface that Data Integration Studio provides. Among its features are these integration capabilities:

  • SAS program authoring and submittal
  • SAS Stored Processes execution
  • SAS Metadata use
  • Message queue communication
  • Web Service communication (SOAP, RESTful through HTTP)
  • DMP interaction with real-time data services, batch jobs, process jobs and profile jobs (through Data Quality Server)
  • Command line execution
  • Email delivery and FTP access

 

SAS Visual Process Orchestration

Visual Process Orchestration is a web application that can tie together various SAS code-based and non SAS code-based data integration and data management processes in a visual data workflow environment. It is differentiated by its web-based user interface and built-in logical data flow processing. It also nicely spans the Data Management Platform and SAS code-based data management technologies. Visual Process Orchestration has the following integration options:

  • SAS program submittal
  • SAS metadata access for users and groups only
  • Deployed Data Integration Studio job invocation
  • DMP interaction with real-time data services, batch jobs, process jobs, and profile jobs
  • Web Service communication (SOAP, RESTful through HTTP)
  • Process Orchestration job invocation
  • Command line execution (both of Process Orchestration jobs and from within Process Orchestration jobs)

 

You can see that these applications have many ways to communicate with other technologies and with each other. You could for example construct processes that interact with each other like this:

  • A Data Integration Studio job executes SAS code that invokes SAS Data Quality Server, which in turn kicks off batch processes on the SAS Data Management Platform
  • A Data Management Platform process job executes SAS code, evaluates the results, moves the results through FTP to a remote computer and sends out an email notification when the process has completed
  • A Visual Process Orchestration job invokes a process job on the SAS Data Management Platform that then uses SAS Federation Server to access data and send the results to an external web service

 

I’ve only scratched the surface of the deep set of features provide by the applications discussed here. Understanding the technology options you have for data management and the interplay among these applications is the first step in making the right choice for your project.

 

Do you have any projects to share where you had to use two or more of these technologies in an innovative way? Did you use the integration features listed here or maybe a few that I missed?

Comments

I received the following question about this article:

Could you give an example of a data management project that would use each of the alternatives?

Here's my quick take:

  1. SAS Data Management Platform - An organization wants to clean up data in operational systems and deploy new real-time data services that will prevent dirty data from entering operational systems in the first place.
  2. SAS Data Integration Studio - An organization wants to load data marts to support business intelligence reporting and advanced analytics.
  3. SAS Visual Process Orchestration - An organization wants to build a process that employs custom business logic to invoke jobs designed in SAS Data Management Platform or SAS Data Integration Studio as dictated by the data being evaluated or the systems requesting the data.

It's been 14 years since SAS acquired Data Flux. And still the product line aren't integrated, must be some kind of record?

Or does SAS not want to integrate them? It's hard to tell by new releases where SAS is going with the Data Management/Integration offering(s). But as for today, I consider it a best of breed, not an end-to-end offering.

Hey Linus,

As the product manager for SAS Data Management, allow me to address your comments.

To understand why things are the way they are, it is helpful to understand the history of SAS and DataFlux. When DataFlux was originally acquired, it was established and run as a fully independent subsidiary of SAS for more than 11 years. DataFlux had its own customers, independent of SAS, and a result, had different market requirements that it was being positioned to satisfy. This explains why DataFlux technology and SAS DI were developed independent of one another and why they were not integrated during this time.

As the data management market changed and evolved, the strategy of having an independent DataFlux also changed, and in 2012, DataFlux was closed as a subsidiary and its products and people became part of the newly established SAS Data Management division. Since then, subsequent product releases have largely focused on integrating the two product stacks and eliminating the "DataFlux" and "SAS" approaches and offering a single "SAS Data Management" approach and offering.

With that said, there are some things to keep in mind:

  1. Both SAS and DataFlux have large user bases that are still using earlier releases of these technologies. As a result, some current customers will perceive no difference in this integration effort because they have not yet upgraded to the latest SAS Data Management offerings.
  2. 11 years of independent development can't be replaced by a year or two of work. Though some gaps remain, each release of SAS Data Management has delivered increased integration over every prior release. This takes the products to where we are at today, where many data management capabilities are SAS Data Management only.
  3. Rather than focus solely on porting existing technologies, the focus of SAS Data Management has been to create and release the next generation of data management technologies and have these be SAS Data Management only. Some work has been involved in porting and expanding the feature set of existing, older "DataFlux" and "SAS" capabilities, but some older technologies will not be ported simply because the market has evolved elsewhere.

So, to sum up my response to your question - I can appreciate why you have some questions about what direction things are headed for SAS Data Management. I can also assure you that our current offerings as well as our roadmap reflect our commitment to a single, fully integrated data management stack for all of SAS. Expect to see even more work to come from SAS Data Management that demonstrates this approach.

Regards,

Mike F.

Thank you for your adequate and informative answer. I understand that you have some issues. But data flux has been bundled with data integration from almost the beginning. So I don't think most SAS customers/partners have seen it as a separate company.

So I take from your answer that the data flux leg will be the foundation of this data management platform. And I have seen very neat thing coming out here like business rules mgr.

But still. Meta data and data integration is the foundation, bread and butter of all major data warehouses. So I think need to address where you are going with data integration in that context. We still remember the horror of moving from WA to ETLS 😉

Cheers,

Linus

Excellent article and discussion; I'll certainly be sending it along to many of my colleagues.

To me, a discussion of the SAS data integration toolkit has to include the Base SAS facilities. In my opinion, this is a key differentiator of SAS from other products.

What I'm referring to is the "95%" problem; I can get 95% of what I need quickly and effectively using the interactive, powerful tools. Now what do I do about the last 5%?

With most products, I need to drop down into C or VB to develop it. Both of these languages are EXTREMELY low-productivity for data management tasks. Or I can try do develop a SQL stored procedure to implement the requirement, usually leading to a piece of SQL that is highly inefficient and completely unmaintainable.

With the SAS toolkit, all I need to do is find a competent Base SAS programmer, and they can use Base SAS, which is designed from the ground up as a productive tool for data management tasks, to fill in the gap. This code can then be exposed as a custom stored process, and be integrated into the solution.

Thanks again, everyone!

  Tom

As a follow-up to this thread, I wanted to point out some of resources that exist for folks who have questions about Data Management product vision or roadmap.

First, you can post your question to this Community site, and I encourage anyone to do so. Myself or others here at SAS will do our best to answer your questions as forthrightly as possible for the benefit of everyone, however for some specifics, a direct, offline follow-up may be required.

Second, you can check out the papers and proceedings from each SAS Global Forum for hints of things that are coming and the strategic direction of the product. There are clips up on YouTube where we have folks on our R&D team talking about new features and in some cases, previewing future product.

Third, all SAS customers can request a vision and roadmap presentation from their SAS account team. This is useful if you need a more personalized set of details specific for the product(s) that your organization licenses.

I'm sure others exist but just wanted to note some of the best ways to get this information. Thank you and keep the discussion going!

Mike F.

Version history
Last update:
‎10-06-2015 08:09 PM
Updated by:
Contributors

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags