- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all! This is my first post, so I hope it is a success.
This is what I'm trying to do:
I am working with very large data tables that need to be filtered as much as possible to reduce run-time and avoid spool space limitations. I have two tables, CUSTOMER_COMPLAINTS and TRANSACTION_HISTORY. I first query CUSTOMER_COMPLAINTS for a specific complaint date to get the subset of complaints I would like to research. The unique identifier for each complaint is CUSTOMER_COMPLAINT_ID. I then want to query all transactions from TRANSACTION_HISTORY based on the CUSTOMER_COMPLAINTS_ID field (which is also in the transaction_history) in order to find all related transactions.
Here is where I need help! I am joining CUSTOMER_COMPLAINTS with TRANSACTION_HISTORY via the CUSTOMER_COMPLAINT_ID but the TRANSACTION_HISTORY table does not include any sort of date field to filter on first, so the query has use go through the entire TRANSACTION_HISTORY table (I believe) to find where the two tables match up.
My question is: Is there a way to create a list (or something similar) of CUSTOMER_COMPLAINT_ID's from the initial query that I can use to filter the TRANSACTION_HISTORY table on before joining the two tables? The reason I want to do this is to bring the TRANSACTION_HISTORY table down to a reasonable size before querying it.
Any help in doing this or a better solution to speed this query is greatly appreciated!!!
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I understand the problem correctly, you should be able to use an inner join on CUSTOMER_COMPLAINT_ID and apply the filter where the date field in CUSTOMER_COMPLAINTS matches the date you want. And if the tables are indexed on the customer ID, that might help things perform even faster.
But, if you feel it needs a two-step solution (query on a query), then you can formulate this as a subquery. See this blog post about subqueries in SAS Enterprise Guide for more details.
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I understand the problem correctly, you should be able to use an inner join on CUSTOMER_COMPLAINT_ID and apply the filter where the date field in CUSTOMER_COMPLAINTS matches the date you want. And if the tables are indexed on the customer ID, that might help things perform even faster.
But, if you feel it needs a two-step solution (query on a query), then you can formulate this as a subquery. See this blog post about subqueries in SAS Enterprise Guide for more details.
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Chris,
Thanks for the swift reply. I'm a bit of a noob, but is there a way to see if the customer ID is indexed?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try Tasks->Data->Data Attributes and see if anything is revealed in the reports. If your tables are in a database that someone else manages for you, chances are good that this field is indexed, as it would be a common key to filter/sort by.
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Looks like none of the fields are indexed, which is disappointing. Oh well, at least now I know why it is so slow!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Depending on the nature of your system, your system administrators could consider one of the new data management products that are optimized for analytics; they tend to perform very quickly on this kind of query, which is not optimal for traditional DBMS products.
Tom