- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Is it possible to use SAS to join the data from multiple data sources (for instance Teradata, SQL Server, Hadoop)?
Preferrably by writing SQL Select queries.
The data volumes in all datasources are high (GBs) and we need to minimize data replication.
Thanks a lot,
Lucian
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not knowing about your general understanding of SAS/ACCESS processing, but I recommend first to get familiar with the documentation for your databases. Pay special attention to the difference between implicit (libname) and explicit SQL pass-through.
There are chapters in the documentation that discuss how to put processing done to the underlying DBMS, like joins.
But it's a special case when you have large volumes of data to join between different locations. The data needs to at some point be in a common place to make the join executed. Where that location depends on your specific situation, data, requirements, constraints etc.
Apart from doc there have been several SAS Global Proceedings papers discussing ACCESS to RDBMS and Hadoop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But how to optimise it is a total different question and need a detail analysis.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Linus. Any readings you could recommend for optimizing such queries, please?
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not knowing about your general understanding of SAS/ACCESS processing, but I recommend first to get familiar with the documentation for your databases. Pay special attention to the difference between implicit (libname) and explicit SQL pass-through.
There are chapters in the documentation that discuss how to put processing done to the underlying DBMS, like joins.
But it's a special case when you have large volumes of data to join between different locations. The data needs to at some point be in a common place to make the join executed. Where that location depends on your specific situation, data, requirements, constraints etc.
Apart from doc there have been several SAS Global Proceedings papers discussing ACCESS to RDBMS and Hadoop.