SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Joining data from multiple data sources

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Joining data from multiple data sources

Hello,

 

Is it possible to use SAS to join the data from multiple data sources (for instance Teradata, SQL Server, Hadoop)?

Preferrably by writing SQL Select queries.

The data volumes in all datasources are high (GBs) and we need to minimize data replication.

 

Thanks a lot,

Lucian 


Accepted Solutions
Solution
‎10-17-2016 02:38 PM
Super User
Posts: 5,256

Re: Joining data from multiple data sources

Not knowing about your general understanding of SAS/ACCESS processing, but I recommend first to get familiar with the documentation for your databases. Pay special attention to the difference between implicit (libname) and explicit SQL pass-through.

 

There are chapters in the documentation that discuss how to put processing done to the underlying DBMS, like joins.

But it's a special case when you have large volumes of data to join between different locations. The data needs to at some point be in a common place to make the join executed. Where that location depends on your specific situation, data, requirements, constraints etc.

 

Apart from doc there have been several SAS Global Proceedings papers discussing ACCESS to RDBMS and Hadoop.

Data never sleeps

View solution in original post


All Replies
Super User
Posts: 5,256

Re: Joining data from multiple data sources

Yes. Syntactically it's standard functionality.
But how to optimise it is a total different question and need a detail analysis.
Data never sleeps
New Contributor
Posts: 2

Re: Joining data from multiple data sources

Thank you Linus. Any readings you could recommend for optimizing such queries, please?

Regards,

 

Solution
‎10-17-2016 02:38 PM
Super User
Posts: 5,256

Re: Joining data from multiple data sources

Not knowing about your general understanding of SAS/ACCESS processing, but I recommend first to get familiar with the documentation for your databases. Pay special attention to the difference between implicit (libname) and explicit SQL pass-through.

 

There are chapters in the documentation that discuss how to put processing done to the underlying DBMS, like joins.

But it's a special case when you have large volumes of data to join between different locations. The data needs to at some point be in a common place to make the join executed. Where that location depends on your specific situation, data, requirements, constraints etc.

 

Apart from doc there have been several SAS Global Proceedings papers discussing ACCESS to RDBMS and Hadoop.

Data never sleeps
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 342 views
  • 0 likes
  • 2 in conversation